Optimize memory usage while using RDataFrame

Hello @AntoninoFulci ,

and welcome to the ROOT forum! Thank you for the clear explanation of the problem and for checking the forum for similar threads. Indeed, the first thing to do in these cases is check what part of the code exactly is allocating so much memory, and valgrind --tool=massif works well for this.

At the beginning of the report printed by ms_print there should be a summary that tells you which of the following snapshots is the one corresponding to the max memory usage during the lifetime of the program. I don’t see it in your output, so rather than checking every snapshot I just went and assumed that the last detailed snapshot is the most interesting one. This would be snapshot 81:

#-----------
snapshot=81
#-----------
time=350767162856
mem_heap_B=2315119115
mem_heap_extra_B=1271957
mem_stacks_B=0
heap_tree=detailed
n3: 2315119115 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 2147483648 0x9460E05: void std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int&>(__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >, unsigned int&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
  n1: 2147483648 0x4DCA12: void ROOT::Internal::RDF::RAction<ROOT::Internal::RDF::TakeHelper<unsigned int, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> > >, ROOT::Detail::RDF::RLoopManager, ROOT::TypeTraits::TypeList<unsigned int> >::CallExec<unsigned int, 0ul>(unsigned int, long long, ROOT::TypeTraits::TypeList<unsigned int>, std::integer_sequence<unsigned long, 0ul>) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
   n1: 2147483648 0x4D8FE3: ROOT::Internal::RDF::RAction<ROOT::Internal::RDF::TakeHelper<unsigned int, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> > >, ROOT::Detail::RDF::RLoopManager, ROOT::TypeTraits::TypeList<unsigned int> >::Run(unsigned int, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
    n1: 2147483648 0x949B641: ROOT::Detail::RDF::RLoopManager::RunAndCheckFilters(unsigned int, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
     n1: 2147483648 0x94A756A: ROOT::Detail::RDF::RLoopManager::RunTreeReader() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
      n1: 2147483648 0x94A8114: ROOT::Detail::RDF::RLoopManager::Run(bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
       n1: 2147483648 0x4C1DEF: ROOT::RDF::RResultPtr<std::vector<unsigned int, std::allocator<unsigned int> > >::TriggerRun() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
        n1: 2147483648 0x4BF904: ROOT::RDF::RResultPtr<std::vector<unsigned int, std::allocator<unsigned int> > >::Get() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
         n1: 2147483648 0x4BC721: ROOT::RDF::RResultPtr<std::vector<unsigned int, std::allocator<unsigned int> > >::operator->() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
          n0: 2147483648 0x4B3927: main (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
 n0: 92483235 in 3769 places, all below massif's threshold (1.00%)
 n1: 75152232 0x5FA3C56: TFileCacheRead::SetEnablePrefetchingImpl(bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libRIO.so)
  n1: 75152232 0x5FA4149: TFileCacheRead::TFileCacheRead(TFile*, int, TObject*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libRIO.so)
   n1: 75152232 0x7A2D833: TTreeCache::TTreeCache(TTree*, int) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
    n1: 75152232 0x7A43A72: TTree::SetCacheSizeAux(bool, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
     n1: 75152232 0x7A451FB: TTree::LoadTree(long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
      n1: 75152232 0x79FE744: TChain::LoadTree(long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
       n1: 75152232 0x79FC5E9: TChain::GetListOfBranches() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
        n1: 75152232 0x94A59FD: (anonymous namespace)::GetBranchNamesImpl(TTree&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::set<TTree*, std::less<TTree*>, std::allocator<TTree*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
         n1: 75152232 0x94A6871: ROOT::Internal::RDF::GetBranchNames[abi:cxx11](TTree&, bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
          n2: 75152232 0x94A6970: ROOT::Detail::RDF::RLoopManager::GetBranchNames[abi:cxx11]() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
           n1: 45152232 0x4B4CCC: std::enable_if<std::is_default_constructible<double>::value, ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> >::type ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::DefineImpl<main::{lambda(double)#2}, ROOT::Detail::RDF::ExtraArgsForDefine::None, double>(std::basic_string_view<char, std::char_traits<char> >, main::{lambda(double)#2}&&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator<char> >, std::allocator<std::allocator<char> > > const&, std::allocator<char> const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
            n1: 45152232 0x4B4943: ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Define<main::{lambda(double)#2}, 0>(std::basic_string_view<char, std::char_traits<char> >, main::{lambda(double)#2}, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator<char> >, std::allocator<std::allocator<char> > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
             n0: 45152232 0x4B35EB: main (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
           n1: 30000000 0x947D51E: ROOT::Internal::RDF::GetValidatedColumnNames(ROOT::Detail::RDF::RLoopManager&, unsigned int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, ROOT::Internal::RDF::RColumnRegister const&, ROOT::RDF::RDataSource*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
            n1: 30000000 0x4BA58D: ROOT::RDF::RInterfaceBase::GetValidatedColumnNames(unsigned int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
             n1: 30000000 0x4C10E4: void ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::ForeachSlot<std::function<void (unsigned int, unsigned int)> >(std::function<void (unsigned int, unsigned int)>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
              n1: 30000000 0x4B489A: void ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Foreach<main::{lambda(unsigned int)#1}>(main::{lambda(unsigned int)#1}, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::allocator<char> > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
               n0: 30000000 0x4B345B: main (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)

mem_heap_B says that at this point of the program execution the application was using roughly 2GB of RAM, and the stacks reported below say that most of these allocations are due to a malloc called by std::vector during reallocation inside a RAction<TakeHelper> (i.e. a Take RDataFrame action).

Given that you say the program runs out of memory before your point 3., I guess the suspicious Take is auto SurfaceIDs = Events.Take<unsigned int>("SurfaceID");. How many events do you expect it to store? How much memory would you expect it to use? Or maybe you call that line N times and each one requires hundreds of MBs?

Cheers,
Enrico