Hi,
starting with @swunsch’s C++ rewriting, I investigated a little bit. I wanted to figure out where the extra time was spent in the python version vs the C++ version. The obvious difference is that in the C++ version code is compiled ahead of time (as usual, by the compiler) and optimized, while in the python version the code is compiled just-in-time by the ROOT interpreter, and the ROOT interpreter does not do compiler optimization, at least by default.
First of all, with some timers I could verify that time is not spent in the just-in-time compilation itself, but actually running the event loop.
Here are measurements for increasing number of bins (i.e. increasing numbers of Filters and Histo1Ds), compiled code vs just-in-time compiled, with and without compiler optimizations. All runs were single-thread.
All compiled, O3
filters runtime (s)
18 9.76
162 119.86
392 303.188
All compiled, O0
filters runtime (s)
18 35.09
162 331.538
392 802.401
Filters just-in-time compiled, program compiled with O3
filters runtime (s)
18 35.5791
162 629.376
392 1511.88
Filters just-in-time compiled, program compiled with O0
filters runtime (s)
18 37.7406
162 647.737
392 1560.87
So program optimization definitely plays a role, but there is a factor 2 between just-in-time compiled filters and filters compiled with O0 for larger number of histograms that I’m not sure where it comes from, and it would be interesting to investigate (although I’m not quite sure how yet).
Cheers,
Enrico
all_compiled.cpp (2.0 KB) jitting_filters.cpp (2.0 KB)