Dear all
I used RooKeysPdf in a slightly unusual situation, where the large
spread of event weights neccesitated the used of an AKE (rather than
a plain histogram) even when we have 10^5 to 10^6 events.
Unfortunately the CPU time of RooKeysPdf scales as o(N^2) where N is
the number of events.
However, RooKeysPDF is hardcoded to target an resolution on the
x-axis of 1000 points. I modified it to use a similar trick in
calculating the bandwidth parameters (_weights), which dramatically
improved the execution times in our use case from >1 hours to about
3 minutes. The basic scaling for large N is now o(N).
The code is available in:
www-d0.fnal.gov/~aharel/AKE.h
www-d0.fnal.gov/~aharel/AKE.c
I hope the maintainers will consider including a patch along this
lines into ROOT.
Caveats:
- I did not optimize the switch between the two algorithms.
- Need to check the interplay with mirroring - should the switch
be based on _nEvents or on data.numEntries()?
cheers,
Amnon Harel,
University of Rochester