Optimize memory usage while using RDataFrame


ROOT Version: 6.28/00
Platform: “CentOS Linux 7 (Core)”
Compiler: g++ (GCC) 12.2.1 20221030


Dear all,
I wrote a code that analyze simulations, stored in root files, using RDataFrame. The code works fine on my PC (MacBookPro M1), however I can no longer download the simulations and analyze them on my PC because they are getting too big (>100GB). So, now I need to run the same code on a farm. However when I do that the code stop working because it reach the memory limit (8GB). I tried to optimize the code as much as I can following what is wrote in ROOT::RDataFrame Class Reference, but I am clearly missing something. Can you help me optimizing/correcting it?

In the code first I initialize the DataFrames using a TChain made of all the simulations:

    //Inizzializzazione dei TTree
    string files = Directory + "*.root";
    cout << "Analizing root files in folder: " << files << endl;

    TChain ch1 ("Events");
    TChain ch2 ("RunSummary");

    ch1.Add(files.c_str());       //root_files/*.root
    ch2.Add(files.c_str());

    //Inizializzazione RDataFrame
    RDataFrame Events(ch1);
    RDataFrame RunSummary(ch2);

Then I do some operations like:

  1. counting the number of primaries simulated and create a new column with the correct weight
    //Conteggio elettroni simulati
    cout << "Beginning to count primaries simulated..." << endl;
    unsigned int nEOT = 0; 
    RunSummary.Foreach([&nEOT](unsigned int i){ nEOT = nEOT + i;}, {"TotEvents"});
    
    //Definizione delle lambda da usare per definire nuove colonne nel dataframe
    double n_m	= 0.939565378;
    auto Ekin_calc = [n_m](double Etot){ return Etot - n_m; };
    auto PesoEOT_calc = [nEOT, NormFactor](double weight){ return weight/nEOT * NormFactor; };

    //Definizione di due nuovi dataframe da usare per gli istogrammi (ogni nuovo dataframe eredita anche tutte le variabili del genitore)
    cout << "Creating new dataframes with correct weight..." << endl;
    auto Ekin = Events.Define("Ekin", Ekin_calc, {"ETot"}).Define("Peso", PesoEOT_calc, {"Weight1"});           //Creiamo il DF Ekin, ha una colonna Ekin, e una Peso
    auto DF_WeightsEOT = Events.Define("Peso", PesoEOT_calc, {"Weight1"});                                      //Creiamo il DF con i pesi 
  1. finding the number of detectors placed
    cout << "Counting surfaces..." << endl;
    auto SurfaceIDs = Events.Take<unsigned int>("SurfaceID");

    sort(SurfaceIDs->begin(), SurfaceIDs->end());
    vector<unsigned int>::iterator it;
    it = unique(SurfaceIDs->begin(), SurfaceIDs->end());  

    SurfaceIDs->resize(distance(SurfaceIDs->begin(),it));
    SurfaceIDs->erase(std::remove(SurfaceIDs->begin(), SurfaceIDs->end(), 0), SurfaceIDs->end());
  1. calculating some statistics of the simulation
    //Numero totale elettroni simulati
    cout << "Total number of Electrons simulated: "<< nEOT << endl;

    //Calcolo dei tempi di simulatione
    auto AvgTime = RunSummary.Histo1D<double>({"AvgTime", "AvgTime", 250, 0, 0}, "AvgTime");
    auto TotTime = RunSummary.Histo1D<double>({"TotTime", "TotTime", 250, 0, 0}, "TotTime");

    Double_t x, q; q = 0.5; // 0.5 for "median"

    AvgTime->ComputeIntegral(); // just a precaution
    AvgTime->GetQuantiles(1, &x, &q);

    cout<<"Mean time to follow a primary: "<<AvgTime->GetMean()<<endl;
    cout<<"Median time to follow a primary: "<<x<<endl;

    TotTime->ComputeIntegral(); // just a precaution
    TotTime->GetQuantiles(1, &x, &q);

    cout<<"Mean time to complete a job: "<<TotTime->GetMean()<<endl;
    cout<<"Median time to complete a job: "<<x<<endl;

    //Superfici trovate
    cout << "Total Surfaces found: " << SurfaceIDs->size() << " -> ";
    for(auto i : SurfaceIDs){
        cout << i << " ";
    }
    cout << endl;
  1. building some histograms using this fuction to get some information about several surfaces (here I pass the new dataframe with the weight column and the surface ID):
void count_on_surf(ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> Ekin, int ID){
    string filter = "SurfaceID == " + to_string(ID);
    auto nN1 = Ekin.Filter(filter).Count(); 
    double Error; 
    double Integral =  Ekin.Filter(filter).Histo1D<double, double>({"energy", "Energy Det 300; E (GeV); Particles/EOT", 200, 0, 0}, "Ekin", "Weight1")->IntegralAndError(1,200,Error);
    cout << "Entries totali sulla superficie "<< ID << ": \t" << *nN1 << " \t Integrale (NO - EOT): \t"<< Integral << " +/- "<< Error << " (" << Error/Integral*100 << "%)" << endl;
}

  1. lastly I build several histograms in a loop using multiple filter (here an example of what I’m doing):
for(int i = 0; i < SurfaceIDs->size(); i++){
    string filter = "SurfaceID == " + std::to_string(SurfaceIDs->at(i));
    auto energy               = Ekin.Filter(filter).Histo1D<double, double>({Form("%s_Energy_Det%d", particella.c_str(), SurfaceIDs->at(i)), Form("%s energy Det %d; E (GeV); Particles/EOT", particella.c_str(), SurfaceIDs->at(i)), 200, 0, 0},"Ekin", "Peso");
    //another 13 histograms made in this way but with different ranges and variables
    energy_ranges_vec.push_back(energy); //put all the histograms in a vector

    ROOT::RDF::RunGraphs({PFiltered, PxFiltered, PyFiltered, PzFiltered, XFiltered, YFiltered, ZFiltered, energy, energy0_10KeV, energy10_100KeV, energy100KeV_10MeV, energy10_20MeV, energy20_100MeV, energy100MeV_11GeV});

    //Energy ranges histograms saving
    c_energies = new TCanvas(Form("c_energies%d",i), "c_energies", 600*3, 500*3);
    //Some conditions on where to print
    energy_ranges_vec[x]->Draw("histe");
  
    c_energies->SaveAs(Form("Graphs/Energy/Energy_Surface_%03d.png",SurfaceIDs->at(i)));
    energy_ranges_vec.clear();
}

But actually the codes stops before even reaching the loop at the beginning of point 3.

I’m will attach the whole script in case you want to see what I’m doing.

Thanks in advance,
Antonino

As suggest in another post I compiled my macro with:

g++ analysis.C -o analysis.out `root-config --cflags --glibs`

And then runned the program using (the number is a parameter that I pass to the program to normalize the dataframe):

valgrind --tool=massif analysis.out 0.0876664

I’m not really sure how to interpret the file (I used ms_print but I don’t know how to interpret the output).

1 Like

Hello @AntoninoFulci ,

and welcome to the ROOT forum! Thank you for the clear explanation of the problem and for checking the forum for similar threads. Indeed, the first thing to do in these cases is check what part of the code exactly is allocating so much memory, and valgrind --tool=massif works well for this.

At the beginning of the report printed by ms_print there should be a summary that tells you which of the following snapshots is the one corresponding to the max memory usage during the lifetime of the program. I don’t see it in your output, so rather than checking every snapshot I just went and assumed that the last detailed snapshot is the most interesting one. This would be snapshot 81:

#-----------
snapshot=81
#-----------
time=350767162856
mem_heap_B=2315119115
mem_heap_extra_B=1271957
mem_stacks_B=0
heap_tree=detailed
n3: 2315119115 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 2147483648 0x9460E05: void std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int&>(__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >, unsigned int&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
  n1: 2147483648 0x4DCA12: void ROOT::Internal::RDF::RAction<ROOT::Internal::RDF::TakeHelper<unsigned int, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> > >, ROOT::Detail::RDF::RLoopManager, ROOT::TypeTraits::TypeList<unsigned int> >::CallExec<unsigned int, 0ul>(unsigned int, long long, ROOT::TypeTraits::TypeList<unsigned int>, std::integer_sequence<unsigned long, 0ul>) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
   n1: 2147483648 0x4D8FE3: ROOT::Internal::RDF::RAction<ROOT::Internal::RDF::TakeHelper<unsigned int, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> > >, ROOT::Detail::RDF::RLoopManager, ROOT::TypeTraits::TypeList<unsigned int> >::Run(unsigned int, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
    n1: 2147483648 0x949B641: ROOT::Detail::RDF::RLoopManager::RunAndCheckFilters(unsigned int, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
     n1: 2147483648 0x94A756A: ROOT::Detail::RDF::RLoopManager::RunTreeReader() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
      n1: 2147483648 0x94A8114: ROOT::Detail::RDF::RLoopManager::Run(bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
       n1: 2147483648 0x4C1DEF: ROOT::RDF::RResultPtr<std::vector<unsigned int, std::allocator<unsigned int> > >::TriggerRun() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
        n1: 2147483648 0x4BF904: ROOT::RDF::RResultPtr<std::vector<unsigned int, std::allocator<unsigned int> > >::Get() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
         n1: 2147483648 0x4BC721: ROOT::RDF::RResultPtr<std::vector<unsigned int, std::allocator<unsigned int> > >::operator->() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
          n0: 2147483648 0x4B3927: main (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
 n0: 92483235 in 3769 places, all below massif's threshold (1.00%)
 n1: 75152232 0x5FA3C56: TFileCacheRead::SetEnablePrefetchingImpl(bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libRIO.so)
  n1: 75152232 0x5FA4149: TFileCacheRead::TFileCacheRead(TFile*, int, TObject*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libRIO.so)
   n1: 75152232 0x7A2D833: TTreeCache::TTreeCache(TTree*, int) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
    n1: 75152232 0x7A43A72: TTree::SetCacheSizeAux(bool, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
     n1: 75152232 0x7A451FB: TTree::LoadTree(long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
      n1: 75152232 0x79FE744: TChain::LoadTree(long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
       n1: 75152232 0x79FC5E9: TChain::GetListOfBranches() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libTree.so)
        n1: 75152232 0x94A59FD: (anonymous namespace)::GetBranchNamesImpl(TTree&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::set<TTree*, std::less<TTree*>, std::allocator<TTree*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
         n1: 75152232 0x94A6871: ROOT::Internal::RDF::GetBranchNames[abi:cxx11](TTree&, bool) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
          n2: 75152232 0x94A6970: ROOT::Detail::RDF::RLoopManager::GetBranchNames[abi:cxx11]() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
           n1: 45152232 0x4B4CCC: std::enable_if<std::is_default_constructible<double>::value, ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> >::type ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::DefineImpl<main::{lambda(double)#2}, ROOT::Detail::RDF::ExtraArgsForDefine::None, double>(std::basic_string_view<char, std::char_traits<char> >, main::{lambda(double)#2}&&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator<char> >, std::allocator<std::allocator<char> > > const&, std::allocator<char> const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
            n1: 45152232 0x4B4943: ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Define<main::{lambda(double)#2}, 0>(std::basic_string_view<char, std::char_traits<char> >, main::{lambda(double)#2}, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator<char> >, std::allocator<std::allocator<char> > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
             n0: 45152232 0x4B35EB: main (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
           n1: 30000000 0x947D51E: ROOT::Internal::RDF::GetValidatedColumnNames(ROOT::Detail::RDF::RLoopManager&, unsigned int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, ROOT::Internal::RDF::RColumnRegister const&, ROOT::RDF::RDataSource*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
            n1: 30000000 0x4BA58D: ROOT::RDF::RInterfaceBase::GetValidatedColumnNames(unsigned int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
             n1: 30000000 0x4C10E4: void ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::ForeachSlot<std::function<void (unsigned int, unsigned int)> >(std::function<void (unsigned int, unsigned int)>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
              n1: 30000000 0x4B489A: void ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Foreach<main::{lambda(unsigned int)#1}>(main::{lambda(unsigned int)#1}, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::allocator<char> > > const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
               n0: 30000000 0x4B345B: main (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)

mem_heap_B says that at this point of the program execution the application was using roughly 2GB of RAM, and the stacks reported below say that most of these allocations are due to a malloc called by std::vector during reallocation inside a RAction<TakeHelper> (i.e. a Take RDataFrame action).

Given that you say the program runs out of memory before your point 3., I guess the suspicious Take is auto SurfaceIDs = Events.Take<unsigned int>("SurfaceID");. How many events do you expect it to store? How much memory would you expect it to use? Or maybe you call that line N times and each one requires hundreds of MBs?

Cheers,
Enrico

Hi Enrico,
thank you for explaining me how to read the massif output.
As you hinted it was the std::vector<unsigned int> that was allocating a lot of memory. Even though I was filling it only one time, it was filled with more than 1 billion of unsigned int and that’s why it reached 4GB of data later on.

Now I changed that part of code to fill an std::unordered_set<unsigned int> using a .Foreach() call on my dataset, that in the end does what I need. I also had to made sure of the thread safety of that operation.

The code is below:

    std::unordered_set<unsigned int> SurfIDSet;             //Definiamo l'unordered set dove andranno insertie le varie superfici, questo automaticamente accetta solo entries uniche
    std::mutex SurfIDSet_mutex;                             //Servirà insieme alla riga sotto per assicurare che più thread non accedano contemporaneamente alla variabile sopra 

    Events.Foreach([&](unsigned int i){ std::lock_guard<std::mutex> lock(SurfIDSet_mutex); SurfIDSet.insert(i);}, {"SurfaceID"});       //Qui scorriamo su tutte le entries e cerchiamo di inserirle nel set

    std::vector<unsigned int> SurfaceIDs(SurfIDSet.begin(), SurfIDSet.end());                   //Convertiamo il set in vettore
    sort(SurfaceIDs.begin(), SurfaceIDs.end());                                                 //Sortiamo il vettore
    SurfaceIDs.erase(std::remove(SurfaceIDs.begin(), SurfaceIDs.end(), 0), SurfaceIDs.end());   //cancella un eventuale superficie 0, che non dovrebbe esistere

Now the code works fine but now it interrupts when it reaches the part where I build some histograms.
Below the output of the massif file.
massif.txt (421.9 KB)

Along the lines of what you said before, it seems 18GB of ram were used by ROOT::Internal::RDF::BufferedFillHelper::Exec (I hope I’m reading it correctly).
I also must say that the number of threads used was 96.

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 48 4,136,702,414,253   18,211,885,176   18,207,308,479     4,576,697            0
99.97% (18,207,308,479B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->35.19% (6,408,896,512B) 0x945227C: ROOT::Internal::RDF::BufferedFillHelper::Exec(unsigned int, double, double) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
| ->35.19% (6,408,896,512B) 0x4D41E5: void ROOT::Internal::RDF::RAction<ROOT::Internal::RDF::BufferedFillHelper, ROOT::Detail::RDF::RJittedFilter, ROOT::TypeTraits::TypeList<double, double> >::CallExec<double, double, 0ul, 1ul>(unsigned int, long long, ROOT::TypeTraits::TypeList<double, double>, std::integer_sequence<unsigned long, 0ul, 1ul>) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
|   ->35.19% (6,408,896,512B) 0x4D0A61: ROOT::Internal::RDF::RAction<ROOT::Internal::RDF::BufferedFillHelper, ROOT::Detail::RDF::RJittedFilter, ROOT::TypeTraits::TypeList<double, double> >::Run(unsigned int, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Simulations/HallA_2023-03-10/analysis.out)
|     ->35.19% (6,408,896,512B) 0x949B641: ROOT::Detail::RDF::RLoopManager::RunAndCheckFilters(unsigned int, long long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libROOTDataFrame.so)
|       ->35.19% (6,408,896,512B) 0x94A206B: ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)
|         ->35.19% (6,408,896,512B) 0x7DD4AA0: std::_Function_handler<void (unsigned int), void ROOT::TThreadExecutor::Foreach<ROOT::TTreeProcessorMT::Process(std::function<void (TTreeReader&)>)::{lambda(unsigned long)
|           ->35.19% (6,408,896,512B) 0x5C67362: tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, tbb::internal::parallel_for_body<std::function<void (unsigned int)>, unsigned int>, tbb::auto_partitioner const>::execute() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libImt.so)
|             ->35.19% (6,408,896,512B) 0xB0F7694: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|               ->35.19% (6,408,896,512B) 0xB0F79C2: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                 ->35.19% (6,408,896,512B) 0xB0F535F: tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task*, tbb::task*&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                   ->35.19% (6,408,896,512B) 0x5C65AD1: void tbb::strict_ppl::parallel_for_impl<unsigned int, std::function<void (unsigned int)>, tbb::auto_partitioner const>(unsigned int, unsigned int, unsigned int, std::function<void (unsigned int)> const&, tbb::auto_partitioner const&) [clone .part.0] [clone .isra.0] (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libImt.so)
|                     ->35.19% (6,408,896,512B) 0xB0F1C24: tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                       ->35.19% (6,408,896,512B) 0x5C65672: tbb::interface7::internal::delegated_function<ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void (unsigned int)> const&)::{lambda()
|                         ->35.19% (6,408,896,512B) 0xB0F2DBE: tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                           ->35.19% (6,408,896,512B) 0x5C66D1A: ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void (unsigned int)> const&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libImt.so)
|                             ->35.19% (6,408,896,512B) 0x7DD7C32: std::_Function_handler<void (unsigned int), void ROOT::TThreadExecutor::Foreach<ROOT::TTreeProcessorMT::Process(std::function<void (TTreeReader&)>)::{lambda(unsigned long)
|                               ->27.82% (5,066,719,232B) 0x5C67673: tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, tbb::internal::parallel_for_body<std::function<void (unsigned int)>, unsigned int>, tbb::auto_partitioner const>::execute() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libImt.so)
|                               | ->27.82% (5,066,719,232B) 0xB0F7694: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                               |   ->27.82% (5,066,719,232B) 0xB0F79C2: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                               |     ->27.45% (4,999,610,368B) 0xB0F1916: tbb::internal::arena::process(tbb::internal::generic_scheduler&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                               |     | ->27.45% (4,999,610,368B) 0xB0F02BF: tbb::internal::market::process(rml::job&) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                               |     |   ->27.45% (4,999,610,368B) 0xB0ECC9D: tbb::internal::rml::private_worker::run() (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                               |     |     ->27.45% (4,999,610,368B) 0xB0ECEF8: tbb::internal::rml::private_worker::thread_routine(void*) (in /mnt/project_mnt/jlab12/fiber7_fs/afulci/Programs/root-6.28.00/lib/libtbb.so.2)
|                               |     |       ->27.45% (4,999,610,368B) 0xA241EA4: start_thread (in /usr/lib64/libpthread-2.17.so)
|                               |     |         ->27.45% (4,999,610,368B) 0xA554B0C: clone (in /usr/lib64/libc-2.17.so)

By the way, I downloaded the 100 GB of simulation on my pc (10 threads), and the same code worked fine without any memory errors.

Hi @AntoninoFulci ,

filling histograms that do not have a fixed binning requires much more memory per thread than filling histograms that have a specified binning. The former have to use a BufferedFillHelper that buffers the entries in order to then come up with an appropriate binning, the latter uses a normal FillHelper that directly fills the histogram.

In your snippets above all Histo1D calls seem to have a model specified, so I’m not sure where the BufferedFillHelper comes from, but there must be some Histo calls somewhere without a model.

Cheers,
Enrico

P.S.

As an aside, if I were you I would put a timer around this Foreach to make sure the lock is not completely killing the runtime of your application: 96 threads all hammering the same lock might be much slower than running that Foreach with a single thread. Another option of course is to use ForeachSlot, fill one unordered_set per slot, and do a merge of all the thread-local sets at the end.

I had a timer around that call, and indeed what you suggested is A LOT faster (by ~x10 times). This on my pc, on the farm it seems to take longer, I don’t know why.

I left here the code here, in case somebody encounter this kind of problem (my problem was to find unique values in a column):

    TStopwatch stopwatch;
    if(debug) cout << "Counting surfaces..." << endl;
    if(debug) stopwatch.Start();

    std::vector<std::unordered_set<unsigned int>> v_SurfIDSet(num_threads, std::unordered_set<unsigned int>{});

    std::unordered_set<unsigned int> SurfIDSet;             //Definiamo l'unordered set dove andranno insertie le varie superfici, questo automaticamente accetta solo entries uniche

    Events.ForeachSlot([&](unsigned int s, unsigned int i){ v_SurfIDSet[s].insert(i);}, {"SurfaceID"});       //Qui scorriamo tutte le entries e cerchiamo di inserirle nel set
    for(auto us : v_SurfIDSet){
        SurfIDSet.insert(us.begin(), us.end()); //.merge(us) should work too
    }

    std::vector<unsigned int> SurfaceIDs(SurfIDSet.begin(), SurfIDSet.end());                   //Convertiamo il set in vettore
    sort(SurfaceIDs.begin(), SurfaceIDs.end());                                                 //Sortiamo il vettore
    SurfaceIDs.erase(std::remove(SurfaceIDs.begin(), SurfaceIDs.end(), 0), SurfaceIDs.end());   //cancella un eventuale superficie 0, che non dovrebbe esistere

    if(debug){stopwatch.Stop();  stopwatch.Print();}

For the first problem, I think that BufferedFillHelper may use a lot of memory because I had several histograms where I just put 0, 0 because I don’t know the extremis beforehand. Now, I changed that with something more reasonable and I’m left with only 3 histograms with 0,0 extremis.

Now on my pc the analysis is faster. However the farm is now offline, they’re having some problems. I’ll do un update when I can do some more tests.

1 Like