How to get treename from RDataFrame

rooter_03 · July 14, 2021, 12:32pm

Hi,

When making a RooDataFrame instance from a tree:

df=ROOT.RDataFrame(tree)

Is there a way to retrieve the name of the tree from the dataframe afterwards? I cannot find anything in the documentation. I would expect something like:

df.GetTreeName()

Cheers.

Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

eguiraud · July 14, 2021, 12:40pm

Hi @rooter_03 ,
I am afraid RDataFrame does not provide a way to do that.

On one hand there is the issue that RDF does not always read TTrees, sometimes it’s TChains and each tree in the chain has a different name, sometimes it’s a CSV file, sometimes there is no input at all (just a number of initially empty entries).

On the other hand the information on the tree name must have been available when the RDF was constructed – you can extract it at that point, e.g. from tree itself.

In the next release we’ll have a generic df.DescribeDataset() that will provide a human-readable description of the dataset and will cover all cases. The feature is currently available in our nightly builds.

Cheers,
Enrico

rooter_03 · July 14, 2021, 12:43pm

Hi,

I (and probably I am not the only one) need the treename beyond the initial point. If we cannot retrieve it from the datafram itself, that will mean that we will have to carry around the treename (e.g. as arguments of functions) making the code less readable and more prone to bugs.

Cheers.

eguiraud · July 14, 2021, 12:45pm

Depending on how your code is structured there might be simple ways to store the tree name somewhere.
For example, in Python, you can attach it to the RDF object itself: df.treename = tree.GetName().

rooter_03 · July 14, 2021, 12:47pm

Right, in python that would work. I forgot that the treename can be just added to the object. However that does not work in C++.

rooter_03 · July 14, 2021, 1:21pm

@eguiraud

There is another problem. If we use Filter for instance in:

import ROOT

df = ROOT.RDataFrame('tree', 'file.root')
df.treename = 'tree'

df_1=df.Filter('x<0')
    
print(df.treename)
print(df_1.treename)

the second statement would not work. Because the filtered dataframe does not have treename as an attribute. Therefore we would have to follow each Filter call with a call that would append the treename to the new dataframe. Am I right? Wouldn’t it be better if somehow the Filter function can copy those attributes to the output object?

Cheers

eguiraud · July 14, 2021, 1:42pm

Yes, attributes set like that are per object in Python.

How you decide to keep the tree name around, and what’s the best way to do it, is strictly a function of how your code is structured (and/or you might be able to restructure the code so that it’s very easy to do).

The upcoming DescribeDataset will provide a similar functionality in RDF itself, but because of the caveats mentioned above it might not be a perfect match for what you need.

Best,
Enrico

system · July 28, 2021, 1:43pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.