Howdy, I’m wondering if there is an option/solution to the [fillAnyObject](https://root.cern/doc/master/df005__fillAnyObject_8C.html) in pyROOT yet? I’m trying to fill [RooUnfoldResponse](https://statisticalmethods.web.cern.ch/StatisticalMethods/unfolding/RooUnfold_01-Methods_PY/)Matrices for unfolding.
Currently the only way I can see it to to make all the 1Dhistograms and 2D histograms and then evaluate each of them with a:
for hist in hist_definitions:
histA = Usual rdf.Histo1D(procedure)
histB = Usual rdf.Histo1D(procedure)
histC = Usual rdf.Histo2D(procedure)
response = ROOT.RooUnfoldResponse(histA.GetVal(),histB.GetVal(),histC.GetVal)
which seems horribly inefficient as it will have to evaluate the whole framework for every response I want to make (there will be a couple). Any tips? If there is a way to do that can I also wrap the Miss() function too (for distributions that have x but not y)
Theres a lot of code involved currently so moving everything to C isn’t an option but a wrapper function might be ok no?
yes I’m afraid that runs an event loop for every hist in hist_definitions, because of the GetValues.
I’d suggest to make two loops (and push them to C via list comprehensions if they are large):
histos = [(rdf.Histo1D(...), rdf.Histo1D(...), rdf.Histo1D(...)) for hist in hist_definitions]
responses = [ROOT.RooUnfoldResponse(*map(lambda h: h.GetValue(), hs)) for hs in histos]
Unrelated: if you are running large RDF computation graphs from PyROOT, switch to 6.22 as soon as it’s out (conda-forge already has it), there are major speed improvements in RDF just-in-time compilation.
but then I need to call the loop like a bajillion times.
Note that my two-liner above produces all histograms in one event loop (by delaying the calls to GetValue until after all Histo1D and Histo2D calls have been made).
If RooUnfoldResponse does not inherit from TH1, you can work around the bug that requires it by defining a little helper object that wraps a RooUnfoldResponse and does inherit from TH1, like in this post.
The first GetValue called triggers the event loop that fills all histograms, see e.g. the “Executing multiple actions in the same event loop” section of the docs.
Subsequent GetValue calls will just return the already filled histogram, they will not trigger other event loop.
Can you tell where time is spent in your application exactly? And in case you have many such histograms and just-in-time compilation of the event loop code is a bottleneck, can you try v6.22 to see if there is an improvement?
Gotcha, ok the slow down is likely because I’m reading about a hundred trees into rdfs and applying the same computational graph on them, whereas before it was only calling the graph on the few trees I was plotting.
Ultimately this step comes before building a large RooFit model from them all so it’s good to check that this is only executing everything once.
ROOT 6.22 does indeed seem to be faster (but the docker images need updating as they’re still in ROOT 6.20 so I had to do a manual update which took a long time)
Is there an example of what you mean by “push them to C via list comprehensions if they are large”?
True! v6.22 was officially announced yesterday, we’ll update the docker images in the coming days.
It’s a python thing: when building lists of things, list comprehensions are faster than for loops:
In : l = 
In : %timeit for i in range(100000): l.append(i)
7.58 ms ± 64.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In : %timeit l = [ i for i in range(100000) ]
3.41 ms ± 68.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It’s not actually C, but it’s less python bytecode
Now that I know your use-case a little bit better though (a hundred medium-sized separate computation graphs) I am pretty sure that’s not a bottleneck.
Yeah I’ve been trying that with the code all morning playing with lambdas and filters and the like. Problem is that I need them as dictionaries, so I either start merging rdfs, python filters, numpy named arrays or I just spend that extra few milliseconds on enjoying life. I’ve got it down to three lines and only one for loop and that will have to do.
Yeah real non-problems here. My computation of several hundred variables from with a few GB of data from a hundred different trees take me two minutes on my laptop! I want it faster damn it! and I don’t want to run it on a bigger machine either.