Hi,
I have a problem with RDataFrame, where it doesn’t seem to perform well when filling many histograms. I’m trying to fill N histograms from an RDataFrame by a simple .Filter().Histo1D() operation, and then I want to save these N histograms in a file. This seems to be working as intended, i.e. all histograms are actually only filled once I try doing something with one of them, so the whole event-loop being done once seems to work.
However, the whole procedure depends vastly on what N is. It seems ok for N ~10, but once I increase N to ~100 or more, the time it takes for these operations increases drastically. And that kind of defeats the purpose of using an RDataFrame in the first place…
For instance, filling and saving 20 histograms takes 69.23 seconds, while doing the exact same thing for 200 histograms takes 1725.14 seconds.
Enabling multithreading also doesn’t help.
I attach a script that runs on lxplus7, with a file that can be downloaded by following this link:
https://cernbox.cern.ch/index.php/s/GGDVFWMkwoGEzft
The root version i source is
. /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.16.00/x86_64-centos7-gcc48-opt/bin/thisroot.sh
I also tried ROOT 6.18.00, but I get the same (slow) behavior when increasing N.
Any help would be appreciated!
Best,
-marc
standalone.py (1.6 KB)
ROOT Version: 6.16.00
Platform: lxplus
Compiler: gcc48