I need to compare the results of a RDataFrame analysis performed on different inputs in a single job.
I wonder if it’s possible to somehow change the source of a RDataFrame (or whatever exact type) instance after it has run to then re-run on the new source ?
Hopefully it is clearer with pseudo-code:
df = ROOT.RDataFrame(tree1)
# here many
# df = df.Define(...)
# df =df.Filter(...)
# occur from various places hard to track.
h1 = df.Histo1D(...).GetValue()
df.ChangeInputTree(tree2) # ← possible ??
h2 = df.Histo1D(...).GetValue()
# then do smart things with h1 and h2.
Of course the real setup is much more complex and it’s not possible to save intermediate results & split the operations in several jobs.
Or maybe there is a way to transfer all the Define() and Filter() operations onto a fresh RDataFRame like in :
# replace :
# df.ChangeInputTree(tree2) # ← possible ??
# with
df2 = ROOT.RDataFrame(tree2)
df2.CopyDefAndFilter(df) # possible ??
h2 = df2.Histo1D(...).GetValue()
For technical reasons which are long to explain and I am only partially aware of (but if you’re interested @vpadulan can explain them in detail), this capability is not exposed publicly at the moment. There is the intention to provide this feature in the future, but it’s a more complex problem than it looks on the surface (related, afaik, to the lack of guarantees that RDF has about the specific schema of the second input).
With the current API I’m afraid your best bet is to wrap the building of the RDF into a function that you can call with the various inputs at different times, if possible.
If this solution is too limited for your use case please get in touch with @vpadulan (or just write in this thread) so a better solution may be worked out.
Thank you for your fast answer. I was suspecting what I’m after is not trivial, but it’s very nice to read there are some plans to provide a solution !
I’m happy to test internal or experimental calls if some are already available in recent releases (I’m already using ROOT.Internal.RDF.ChangeBeginAndEndEntries …) otherwise, I can try to work out a dataframe setup function as you suggest.