Friend support within RDataFrame class

It would be nice to be able to add an RDataFrame as a friend to another RDataFrame instead of having to use TTree first to handle friends. Usage example follows.


Suppose I apply a selection fitSel, do a fit, and extract some weights, which I save as a TTree. Further suppose I want to compare the unweighted dataset before the selection to the weighted one after the selection.

Right now, I have to do:

rdf_initial = ROOT.RDataFrame('baseTree', 'base.root')
rdf_filtered = rdf_initial.Filter(fitSel)
rdf_filtered.Snapshot('temp', 'temp.root')
f1 = ROOT.TFile.Open('temp.root')
t1 = f.Get('temp')
f2 = ROOT.TFile.Open('weights.root')
t2 = f.Get('weightTree')
t1.AddFriend(t2)
rdf_filtered = ROOT.RDataFrame(t1)
# do some analysis
os.remove('temp.root')

I would like to do:

rdf_initial = ROOT.RDataFrame('baseTree', 'base.root')
rdf_filtered = rdf_initial.Filter(fitSel)
rdf_weights = ROOT.RDataFrame('weightTree', 'weights.root')
rdf_filtered.AddFriend(rdf_weights)
# do some analysis

10 lines become 4 lines, and I don’t have to create a temporary file on disk.


ROOT Version: 6.22/00
Platform: Not Provided
Compiler: Not Provided


Hi @mwilkins,
I totally agree it would be nice to have this, please open a jira ticket with the feature request.
Note however that there is a reason this is not already there: it can’t just be syntactic sugar for TTree-friendliness, it also needs to work for custom RDataSources, RDFs with no dataset, multi-thread runs, and it requires adding a notion of generic “RDF name” so you can refer to columns of the friends :smiley:

All doable, just needs some thinking and subsequent coding.
Cheers,
Enrico

1 Like

Done. Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.