Understand how Define actually works

Hi, this is more a question to understand the underlying logic used

Let’s say i have a bunch of weights i want to profile against some observable.

What i am doing now is

profModel = r.ROOT.RDF.TProfile1DModel("var","var",100,0,10)
all_profiles = [] 
for myweight in weights  :
      all_profiles.append( df.Filter(selection).\
                                          .Define("weight", myweight).\ #this is an expression
                                           .Profile( profModel, "myVAR", "weight")

All works smoothly but i was assuming it would have been more efficient to do

myFilter = df.Filter(selection)
and then call in the loop only Define(“weight”,myweight) , tough this will duplicate all new columns.
How does it actually works that Filter.Define(“weight”) is always working while df_filter.Define(“weight”) doesn’t?

I can understand this only thinking that .Filter create a node and then each node can Define a new Column at his leasure also duplicating the names happening in other filters.
Is that the case?
Thanks in advance

if I understand correctly, the question is why

for weight in weights:
    .Define("weight", myweight).\
    .Profile( profModel, "myVAR", "weight")

works while

df_filter = df.Filter(selection)
for weight in weights:
    .Define("weight", myweight).\
    .Profile( profModel, "myVAR", "weight")


The thing is…I think both should work, and the second formulation should be more efficient if the evaluation of selection takes a noticeable slice of the runtime.

What’s the error you are getting, and what’s your ROOT version?


My question was about understaning things.
I have ROOT 6.18, i tought that one cannot on a given node “define” several times a new column with the same name.
Is this possible ?
Thanks for interpreting my message :slight_smile:

Barring bugs, in recent versions (I’m not sure if the feature was added in v6.16 or v6.18) you absolutely can (in different branches of the computation graph, otherwise there are ambiguities):

root [0] auto df = ROOT::RDataFrame(10)
(ROOT::RDataFrame &) An empty data frame that will create 10 entries

root [1] auto h42 = df.Define("x", "42").Histo1D("x")
root [2] auto h84 = df.Define("x", "84").Histo1D("x")
root [3] h42->Draw()
root [5] h84->Draw()


You are right, I assumed things were not working and I wrongly made my question.
My question is then, how are columns defined on the same Node with equal names, when running the event loop have no ambiguities?

The trick is that columns are not defined “on a node”, they produce new nodes that contain the definition. Each branch of the computation graph only ever has one node with a given name (but parallel branches can have nodes that define columns with same name: they never talk to each other during the event loop anyway).

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.