std::vector<ROOT::RDF::RResultPtr<TH1D>> th1v; slow writing many histograms


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.16
Platform: Not Provided
Compiler: Not Provided


Hi everyone, I’m trying to use RDataframe for my analysis, I’m able to store many histograms to a std::vector<ROOT::RDF::RResultPtr> hvtns, this action is pretty fast! but then when I try to save them in a root file, that action takes long time and if I increase the number of histograms it will take even longer (please see the code here, it takes ~30 seconds to compile for 300 histograms). Is this something expected, or I’m doing something wrong?, thanks!.

David.

void DDTActivityAna(){
  using floats = ROOT::VecOps::RVec<float> &;
  using s_ints = ROOT::VecOps::RVec<unsigned short> &;
  using ints = ROOT::VecOps::RVec<int> &;

  ROOT::EnableImplicitMT();
  TFile * infile = new TFile("/pnfs/nova/persistent/users/dduenas/TestBeam/TestBeamData/ddtactivity/spillrawdata.root","read");
  TDirectory *dirPtr = (TDirectory*)infile->Get("spillrawdata");
  ROOT::RDataFrame df("spillhits",dirPtr);
  std::vector<double> tns_range;
  std::vector<ROOT::RDF::RResultPtr<TH1D>> hvtns;

  //get the events
  auto eventa = df.Take<Int_t>("event");
  std::vector<int> eventv;
  for(int e = 0; e < eventa->size() ; e++) eventv.push_back(eventa->at(e));
  for(int t = 0 ; t<101; t++ ){tns_range.push_back(t*1000-50000);}

  for(int e = 0; e < 2/*eventv->size()*/ ; e++){
  for(int t = 0 ; t < tns_range.size()-1; t++){

    // 1000ns window
    auto tnscut = [tns_range, t, eventv, e](floats tns, s_ints cellx, s_ints celly,int event, ints hits){ //cut for tns

        return tns[(tns >= tns_range.at(t) && tns < tns_range.at(t+1) && event==eventv.at(e))] ;
    };

    std::stringstream tt;

    tt << "tns" <<t<<"event"<<eventv.at(e);
    const auto namet = tt.str();

    //Define new colums in the dataframe
    auto dtns_c = df.Define(namet.c_str(), tnscut,{"tns","cellx","celly","event","nhits"});
    auto htns_c = dtns_c.Histo1D({Form("htns_%0.fto%0.fevent%i",tns_range.at(t), tns_range.at(t+1),eventv.at(e)),"",100, tns_range.at(t), tns_range.at(t+1)},namet.c_str());

    hvtns.push_back(htns_c);

    }
  }
  auto tf1 = std::chrono::high_resolution_clock::now();
  TFile *f = TFile::Open("events.root","recreate");
  for(int i = 0; i < hvtns.size() ; i++ ){
     hvtns.at(i)->Write();
  }
  f->Close();
  auto tf2 = std::chrono::high_resolution_clock::now();
  auto durationf = std::chrono::duration_cast<std::chrono::microseconds>( tf2 - tf1 ).count();
  std::cout<<"duration in mus "<<durationf<<std::endl;
}

Hi David,
note that RDataFrame only runs the event loop upon first access to one of the results, so, if I read the code correctly, the first time hvtns.at(i)->Write(); is called.

In other words, what you are measuring is not just the writing of the histograms, but also the runtime of the event loop that produces them.

I can’t tell whether 30 seconds runtime is ok or not for producing 300 histograms on that dataset (I’d need to check out the dataset :smile:). Something easy you can try to speed things up is compile the program ahead of time, with optimizations, rather than executing it within the ROOT interpreter.

It’s enough to add a main function at the end:

int main() {
  DDTActivityAna();
  return 0;
}

and then compile the program like a normal C++ program, passing ROOT’s compilation flags:

g++ -o DDTActivityAna DDTActivityAna.C $(root-config --libs --cflags)

I hope this helps! If not, I suggest trying to produce a perf flamegraph to visualize where time is spent. VTune is another useful profiling tool. Possibly, also moving to a more recent ROOT version might help.

Feel free to report any problem here, or share your data so we can take a look ourselves.
Cheers,
Enrico

HI Enrico,

Thank you for your prompt reply, I tried everything you suggested and unfortunately that did not work. I have been trying to get a flame graph but for some reason I have problems with Java(I’ll keep trying this!). In the meantime I’m happy to share the data here. One of my major concerns is that the time to process this is too long, ~30 s is just only for one event and I need to analyze near ~8000. Thank you for taking a look at this.

Best,

David.

Data:

Hi,
thanks for the file.

Compiling without optimizations indeed gives me ~30 seconds runtimes.
However, compiling with -O2 makes runtime drop to less than 4 seconds.

On top of that, changing the just-in-time compiled

dtns_c.Histo1D>(...)

with the compiled

dtns_c.Histo1D<ROOT::RVec<float>>(...)

gets me a runtime of 2.5 seconds.

It would be interesting to know why optimizations matter so much for this particular program (I’m investigating), but anyway, if you want performance, always turn on those compiler flags!

Here is the program I’m compiling: DDTActivityAna.C (2.0 KB)

Compiler invocation:

g++ -O2 -g -Wall -Wextra -Wpedantic -o "DDTActivityAna" "DDTActivityAna.C" $(root-config --libs --cflags)

Cheers,
Enrico

P.S.
flamegraph of the program attached above for -O0 (you can look at it properly with right-clicking and selecting “view image” or analogous):

DDTActivityAna_O0

and for -O2:

DDTActivityAna_O2

The trick to have understandable flamegraphs with RDataFrame is to avoid just-in-time compilation, so e.g. Histo1D<type>() instead of Histo1D(...), Filter(cpp_function) instead of Filter(string), etc.

It seems the “problem” is the amount of RVec operations: it’s highly templated code and you need compiler optimizations to elide all unnecessary boilerplate.