RDataFrame is very slow for many histograms

eguiraud · February 9, 2020, 10:48pm

Hi,
starting with @swunsch’s C++ rewriting, I investigated a little bit. I wanted to figure out where the extra time was spent in the python version vs the C++ version. The obvious difference is that in the C++ version code is compiled ahead of time (as usual, by the compiler) and optimized, while in the python version the code is compiled just-in-time by the ROOT interpreter, and the ROOT interpreter does not do compiler optimization, at least by default.

First of all, with some timers I could verify that time is not spent in the just-in-time compilation itself, but actually running the event loop.

Here are measurements for increasing number of bins (i.e. increasing numbers of Filters and Histo1Ds), compiled code vs just-in-time compiled, with and without compiler optimizations. All runs were single-thread.

All compiled, O3

filters  runtime (s)
18      9.76
162     119.86
392     303.188

All compiled, O0

filters  runtime (s)
18      35.09
162     331.538
392     802.401

Filters just-in-time compiled, program compiled with O3

filters runtime (s)
18     35.5791
162    629.376
392    1511.88

Filters just-in-time compiled, program compiled with O0

filters runtime (s)
18     37.7406
162    647.737
392    1560.87

So program optimization definitely plays a role, but there is a factor 2 between just-in-time compiled filters and filters compiled with O0 for larger number of histograms that I’m not sure where it comes from, and it would be interesting to investigate (although I’m not quite sure how yet).

Cheers,
Enrico

all_compiled.cpp (2.0 KB) jitting_filters.cpp (2.0 KB)