How to get treename from RDataFrame

Hi,

When making a RooDataFrame instance from a tree:

df=ROOT.RDataFrame(tree)

Is there a way to retrieve the name of the tree from the dataframe afterwards? I cannot find anything in the documentation. I would expect something like:

df.GetTreeName()

Cheers.


Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi @rooter_03 ,
I am afraid RDataFrame does not provide a way to do that.

On one hand there is the issue that RDF does not always read TTrees, sometimes it’s TChains and each tree in the chain has a different name, sometimes it’s a CSV file, sometimes there is no input at all (just a number of initially empty entries).

On the other hand the information on the tree name must have been available when the RDF was constructed – you can extract it at that point, e.g. from tree itself.

In the next release we’ll have a generic df.DescribeDataset() that will provide a human-readable description of the dataset and will cover all cases. The feature is currently available in our nightly builds.

Cheers,
Enrico

Hi,

I (and probably I am not the only one) need the treename beyond the initial point. If we cannot retrieve it from the datafram itself, that will mean that we will have to carry around the treename (e.g. as arguments of functions) making the code less readable and more prone to bugs.

Cheers.

Depending on how your code is structured there might be simple ways to store the tree name somewhere.
For example, in Python, you can attach it to the RDF object itself: df.treename = tree.GetName().

Right, in python that would work. I forgot that the treename can be just added to the object. However that does not work in C++.

@eguiraud

There is another problem. If we use Filter for instance in:

import ROOT

df = ROOT.RDataFrame('tree', 'file.root')
df.treename = 'tree'

df_1=df.Filter('x<0')
    
print(df.treename)
print(df_1.treename)

the second statement would not work. Because the filtered dataframe does not have treename as an attribute. Therefore we would have to follow each Filter call with a call that would append the treename to the new dataframe. Am I right? Wouldn’t it be better if somehow the Filter function can copy those attributes to the output object?

Cheers

Yes, attributes set like that are per object in Python.

How you decide to keep the tree name around, and what’s the best way to do it, is strictly a function of how your code is structured (and/or you might be able to restructure the code so that it’s very easy to do).

The upcoming DescribeDataset will provide a similar functionality in RDF itself, but because of the caveats mentioned above it might not be a perfect match for what you need.

Best,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.