Please read tips for efficient and successful posting and posting code
ROOT Version: 6.16
Platform: Not Provided
Compiler: Not Provided
Hi everyone, I’m trying to use RDataframe for my analysis, I’m able to store many histograms to a std::vector<ROOT::RDF::RResultPtr> hvtns, this action is pretty fast! but then when I try to save them in a root file, that action takes long time and if I increase the number of histograms it will take even longer (please see the code here, it takes ~30 seconds to compile for 300 histograms). Is this something expected, or I’m doing something wrong?, thanks!.
David.
void DDTActivityAna(){
using floats = ROOT::VecOps::RVec<float> &;
using s_ints = ROOT::VecOps::RVec<unsigned short> &;
using ints = ROOT::VecOps::RVec<int> &;
ROOT::EnableImplicitMT();
TFile * infile = new TFile("/pnfs/nova/persistent/users/dduenas/TestBeam/TestBeamData/ddtactivity/spillrawdata.root","read");
TDirectory *dirPtr = (TDirectory*)infile->Get("spillrawdata");
ROOT::RDataFrame df("spillhits",dirPtr);
std::vector<double> tns_range;
std::vector<ROOT::RDF::RResultPtr<TH1D>> hvtns;
//get the events
auto eventa = df.Take<Int_t>("event");
std::vector<int> eventv;
for(int e = 0; e < eventa->size() ; e++) eventv.push_back(eventa->at(e));
for(int t = 0 ; t<101; t++ ){tns_range.push_back(t*1000-50000);}
for(int e = 0; e < 2/*eventv->size()*/ ; e++){
for(int t = 0 ; t < tns_range.size()-1; t++){
// 1000ns window
auto tnscut = [tns_range, t, eventv, e](floats tns, s_ints cellx, s_ints celly,int event, ints hits){ //cut for tns
return tns[(tns >= tns_range.at(t) && tns < tns_range.at(t+1) && event==eventv.at(e))] ;
};
std::stringstream tt;
tt << "tns" <<t<<"event"<<eventv.at(e);
const auto namet = tt.str();
//Define new colums in the dataframe
auto dtns_c = df.Define(namet.c_str(), tnscut,{"tns","cellx","celly","event","nhits"});
auto htns_c = dtns_c.Histo1D({Form("htns_%0.fto%0.fevent%i",tns_range.at(t), tns_range.at(t+1),eventv.at(e)),"",100, tns_range.at(t), tns_range.at(t+1)},namet.c_str());
hvtns.push_back(htns_c);
}
}
auto tf1 = std::chrono::high_resolution_clock::now();
TFile *f = TFile::Open("events.root","recreate");
for(int i = 0; i < hvtns.size() ; i++ ){
hvtns.at(i)->Write();
}
f->Close();
auto tf2 = std::chrono::high_resolution_clock::now();
auto durationf = std::chrono::duration_cast<std::chrono::microseconds>( tf2 - tf1 ).count();
std::cout<<"duration in mus "<<durationf<<std::endl;
}
Hi David,
note that RDataFrame only runs the event loop upon first access to one of the results, so, if I read the code correctly, the first time hvtns.at(i)->Write();
is called.
In other words, what you are measuring is not just the writing of the histograms, but also the runtime of the event loop that produces them.
I can’t tell whether 30 seconds runtime is ok or not for producing 300 histograms on that dataset (I’d need to check out the dataset ). Something easy you can try to speed things up is compile the program ahead of time, with optimizations, rather than executing it within the ROOT interpreter.
It’s enough to add a main function at the end:
int main() {
DDTActivityAna();
return 0;
}
and then compile the program like a normal C++ program, passing ROOT’s compilation flags:
g++ -o DDTActivityAna DDTActivityAna.C $(root-config --libs --cflags)
I hope this helps! If not, I suggest trying to produce a perf flamegraph to visualize where time is spent. VTune is another useful profiling tool. Possibly, also moving to a more recent ROOT version might help.
Feel free to report any problem here, or share your data so we can take a look ourselves.
Cheers,
Enrico
HI Enrico,
Thank you for your prompt reply, I tried everything you suggested and unfortunately that did not work. I have been trying to get a flame graph but for some reason I have problems with Java(I’ll keep trying this!). In the meantime I’m happy to share the data here. One of my major concerns is that the time to process this is too long, ~30 s is just only for one event and I need to analyze near ~8000. Thank you for taking a look at this.
Best,
David.
Data:
Hi,
thanks for the file.
Compiling without optimizations indeed gives me ~30 seconds runtimes.
However, compiling with -O2
makes runtime drop to less than 4 seconds.
On top of that, changing the just-in-time compiled
dtns_c.Histo1D>(...)
with the compiled
dtns_c.Histo1D<ROOT::RVec<float>>(...)
gets me a runtime of 2.5 seconds.
It would be interesting to know why optimizations matter so much for this particular program (I’m investigating), but anyway, if you want performance, always turn on those compiler flags!
Here is the program I’m compiling: DDTActivityAna.C (2.0 KB)
Compiler invocation:
g++ -O2 -g -Wall -Wextra -Wpedantic -o "DDTActivityAna" "DDTActivityAna.C" $(root-config --libs --cflags)
Cheers,
Enrico
P.S.
flamegraph of the program attached above for -O0
(you can look at it properly with right-clicking and selecting “view image” or analogous):
and for -O2
:
The trick to have understandable flamegraphs with RDataFrame is to avoid just-in-time compilation, so e.g. Histo1D<type>()
instead of Histo1D(...)
, Filter(cpp_function)
instead of Filter(string)
, etc.
It seems the “problem” is the amount of RVec operations: it’s highly templated code and you need compiler optimizations to elide all unnecessary boilerplate.