@vpadulan This is very useful, thanks a lot! I have had a go with this and it is very clear that the only Jitting happening is in Snapshot()
calls, which is re-assuring in some sense that the code is written ok.
I have now changed my code around to run subprocesses. I wanted to outline some of the results here. In the tests I am running over 23 systematic trees:
1- If I run the trees directly using the c++ executable, all in one-go, the memory profile is
2- If I run these usign python subprocesses, serially, in chunks of maximum 10 trees (10,10,3), the profile is:
which is much better in terms of accumlation of memory! Also, unexpectedly, I see a massive imprvement in speed I am not sure I can explain that, so would appreciate your insight!
It is worth noting that in the python script, I do some light checks on the file before I pass it to the C++ code, using pyROOT. There is a flat region there where nothing is happening. This was intriguing, so I did a bit of digging there to see why this memory is kept (my initial guess was python needing it). I had a look at the memory usage between 151s and 182s and I find the following:
95.94% (198,508,949B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->24.38% (50,451,900B) in 5445 places, all below massif's threshold (1.00%)
|
->13.68% (28,307,136B) 0x621B5782: clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| ->13.68% (28,307,136B) 0x621DCB6B: clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule>*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| ->13.68% (28,307,136B) 0x61DFA5E5: clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| ->13.68% (28,307,136B) 0x61DFBF08: clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::pair<clang::IdentifierInfo*, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| ->13.68% (28,307,136B) 0x627A8E45: clang::Sema::ActOnModuleImport(clang::SourceLocation, clang::SourceLocation, clang::SourceLocation, llvm::ArrayRef<std::pair<clang::IdentifierInfo*, clang::SourceLocation> >) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->13.68% (28,307,136B) 0x617A2343: cling::Interpreter::loadModule(clang::Module*, bool) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->13.68% (28,307,136B) 0x616BE316: LoadModule(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Interpreter&) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->13.68% (28,307,136B) 0x616E7748: RegisterCxxModules(cling::Interpreter&) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->13.68% (28,307,136B) 0x616E944D: TCling::TCling(char const*, char const*, char const* const*, void*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->13.68% (28,307,136B) 0x616EB3E3: CreateInterpreter (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->13.68% (28,307,136B) 0x5505E7FB: TROOT::InitInterpreter() (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCore.so)
| | | ->13.68% (28,307,136B) 0x5505ECAE: ROOT::Internal::GetROOT2() (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCore.so)
| | | ->13.68% (28,307,136B) 0x54E9F9FB: __static_initialization_and_destruction_0(int, int) [clone .constprop.0] (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy_backend3_9.so)
| | | ->13.68% (28,307,136B) 0x400F9C2: _dl_init (in /usr/lib64/ld-2.17.so)
| | | ->13.68% (28,307,136B) 0x401459D: dl_open_worker (in /usr/lib64/ld-2.17.so)
| | | ->13.68% (28,307,136B) 0x400F7D3: _dl_catch_error (in /usr/lib64/ld-2.17.so)
| | | ->13.68% (28,307,136B) 0x4013B8A: _dl_open (in /usr/lib64/ld-2.17.so)
| | | ->13.68% (28,307,136B) 0x5227FAA: dlopen_doit (in /usr/lib64/libdl-2.17.so)
| | | ->13.68% (28,307,136B) 0x400F7D3: _dl_catch_error (in /usr/lib64/ld-2.17.so)
| | | ->13.68% (28,307,136B) 0x52285AC: _dlerror_run (in /usr/lib64/libdl-2.17.so)
| | | ->13.68% (28,307,136B) 0x5228040: dlopen@@GLIBC_2.2.5 (in /usr/lib64/libdl-2.17.so)
| | | ->13.68% (28,307,136B) 0x4C1B60F: _PyImport_FindSharedFuncptr (dynload_shlib.c:100)
| | | ->13.68% (28,307,136B) 0x4BE6F04: _PyImport_LoadDynamicModuleWithSpec (importdl.c:137)
| | | ->13.68% (28,307,136B) 0x4BE4864: _imp_create_dynamic_impl (import.c:2302)
| | | ->13.68% (28,307,136B) 0x4BE4864: _imp_create_dynamic (import.c.h:330)
| | | ->13.68% (28,307,136B) 0x4B2084E: cfunction_vectorcall_FASTCALL (methodobject.c:430)
| | | ->13.68% (28,307,136B) 0x4A8F213: do_call_core (ceval.c:5125)
| | | ->13.68% (28,307,136B) 0x4A8F213: _PyEval_EvalFrameDefault (ceval.c:3582)
| | | ->13.68% (28,307,136B) 0x4BBF78D: _PyEval_EvalFrame (pycore_ceval.h:40)
| | | ->13.68% (28,307,136B) 0x4BBF78D: _PyEval_EvalCode (ceval.c:4329)
| | | ->13.68% (28,307,136B) 0x4ADC330: _PyFunction_Vectorcall (call.c:396)
| | | ->13.68% (28,307,136B) 0x4A91831: _PyObject_VectorcallTstate (abstract.h:118)
| | | ->13.68% (28,307,136B) 0x4A91831: PyObject_Vectorcall (abstract.h:127)
| | | ->13.68% (28,307,136B) 0x4A91831: call_function (ceval.c:5077)
| | | ->13.68% (28,307,136B) 0x4A91831: _PyEval_EvalFrameDefault (ceval.c:3489)
| | | ->13.68% (28,307,136B) 0x4A89BAA: _PyEval_EvalFrame (pycore_ceval.h:40)
| | | ->13.68% (28,307,136B) 0x4A89BAA: function_code_fastcall (call.c:330)
| | |
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
|
->12.01% (24,854,010B) 0x617933B9: cling::Dyld::ContainsSymbol((anonymous namespace)::LibraryPath const*, llvm::StringRef, unsigned int) const (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| ->09.97% (20,625,225B) 0x61796107: cling::Dyld::searchLibrariesForSymbol[abi:cxx11](llvm::StringRef, bool) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->09.97% (20,625,225B) 0x61796384: cling::DynamicLibraryManager::searchLibrariesForSymbol[abi:cxx11](llvm::StringRef, bool) const (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->09.97% (20,625,225B) 0x6172305A: AutoloadLibraryGenerator::tryToGenerate(llvm::orc::LookupState&, llvm::orc::LookupKind, llvm::orc::JITDylib&, llvm::orc::JITDylibLookupFlags, llvm::orc::SymbolLookupSet const&) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->09.97% (20,625,225B) 0x62C334BC: llvm::orc::ExecutionSession::OL_applyQueryPhase1(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >, llvm::Error) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->09.97% (20,625,225B) 0x62C3E0A9: llvm::orc::ExecutionSession::lookup(llvm::orc::LookupKind, std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolLookupSet, llvm::orc::SymbolState, llvm::unique_function<void (llvm::Expected<llvm::DenseMap<llvm::orc::SymbolStringPtr, llvm::JITEvaluatedSymbol, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr>, llvm::detail::DenseMapPair<llvm::orc::SymbolStringPtr, llvm::JITEvaluatedSymbol> > >)>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | ->09.43% (19,514,165B) 0x62C3ECD7: llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolLookupSet const&, llvm::orc::LookupKind, llvm::orc::SymbolState, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x62C3EF33: llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolStringPtr, llvm::orc::SymbolState) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x62C546D4: llvm::orc::LLJIT::lookupLinkerMangled(llvm::orc::JITDylib&, llvm::orc::SymbolStringPtr) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x61820393: cling::IncrementalJIT::getSymbolAddress(llvm::StringRef, bool) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x6181A761: cling::IncrementalExecutor::getAddressOfGlobal(llvm::StringRef, bool*) const (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x617A65E2: cling::Interpreter::getAddressOfGlobal(clang::GlobalDecl const&, bool*) const (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x6173F9C7: TClingDataMemberInfo::Offset() (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCling.so)
| | | ->09.43% (19,514,165B) 0x5516C979: TDataMember::GetOffsetCint() const (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libCore.so)
| | | ->09.43% (19,514,165B) 0x54EA55AD: Cppyy::GetDatamemberOffset(unsigned long, unsigned long) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy_backend3_9.so)
| | | ->09.43% (19,514,165B) 0x54E21C12: CPyCppyy::CPPDataMember::Set(unsigned long, unsigned long) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E4A92F: CPyCppyy::BuildScopeProxyDict(unsigned long, _object*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E4D2E5: CPyCppyy::CreateScopeProxy(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _object*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E4DED9: CPyCppyy::CreateScopeProxy(unsigned long) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E4DF43: CPyCppyy::BindCppObjectNoCast(void*, unsigned long, unsigned int) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E259A5: CPyCppyy::CPPMethod::ExecuteFast(void*, long, CPyCppyy::CallContext*) [clone .isra.0] (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E28367: CPyCppyy::CPPMethod::ExecuteProtected(void*, long, CPyCppyy::CallContext*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E26439: CPyCppyy::CPPMethod::Execute(void*, long, CPyCppyy::CallContext*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E270AB: CPyCppyy::CPPMethod::Call(CPyCppyy::CPPInstance*&, _object*, _object*, CPyCppyy::CallContext*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x54E2B7B6: CPyCppyy::(anonymous namespace)::mp_call(CPyCppyy::CPPOverload*, _object*, _object*) (in /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.28.00-a8528/x86_64-centos7-gcc11-opt/lib/libcppyy3_9.so)
| | | ->09.43% (19,514,165B) 0x4ADC5A7: _PyObject_MakeTpCall (call.c:191)
| | | ->09.43% (19,514,165B) 0x4A92208: _PyObject_VectorcallTstate (abstract.h:116)
| | | ->09.43% (19,514,165B) 0x4A92208: _PyObject_VectorcallTstate (abstract.h:103)
| | | ->09.43% (19,514,165B) 0x4A92208: PyObject_Vectorcall (abstract.h:127)
| | | ->09.43% (19,514,165B) 0x4A92208: call_function (ceval.c:5077)
| | | ->09.43% (19,514,165B) 0x4A92208: _PyEval_EvalFrameDefault (ceval.c:3489)
| | | ->09.43% (19,514,165B) 0x4BBF78D: _PyEval_EvalFrame (pycore_ceval.h:40)
| | | ->09.43% (19,514,165B) 0x4BBF78D: _PyEval_EvalCode (ceval.c:4329)
| | | ->09.43% (19,514,165B) 0x4BBFACD: _PyEval_EvalCodeWithName (ceval.c:4361)
| | | ->09.43% (19,514,165B) 0x4BBFB1A: PyEval_EvalCodeEx (ceval.c:4377)
| | |
| | ->00.54% (1,111,060B) in 1+ places, all below ms_print's threshold (01.00%)
| |
with another 20% similar to the stacks above, and then small contributions from Clang AST, llvm and PyObjectMalloc making up the remaining 30% of the memory.
To me this suggests that the flat part is coming from C++ ROOT (starting with PyEval I am guessing from invoking os.subprocess). I am not complaining about the flatness, however I am quite surprised by it. Is this exepcted? Is it some result of caching? I would appreciate your insight here too (also see attached ms_print
output for this setup)
ms_print_subproc_10trees_23syst_skip_nonrgxd_chunks_root628_massif.out.513.txt (1.9 MB)