How to specify RDF.Snapshot() column types in PyROOT?

Hi,

I can see that in C++ ROOT one can specify the type different columns should have when writing them to disk with Snapshot, what is the way to do this in pyroot?

e.g. in C++

// specifying template parameters ("x" is `int`, "y" is `float`)
df.Snapshot<int, float>("outputTree", "outputFile.root", {"x", "y"});

Thanks,
Tom

Hi,
@etejedor can comment with more authority on the subject, but I don’t think that current PyROOT supports calling C++ template methods. So there is no straightforward way to do it.

Depending on why you need to specify the template parameters, it might or might not be worth it to create a helper function that does what you need.

Something like (haven’t tested this code):

def create_snapshot_call(*types: str):
  call_str = "void CallSnapshot(ROOT::RDF::RNode df) { df.Snapshot<"
  # add template arguments to call_str
  for t in types:
    call_str += f"{t},"
  call_str = call_str[:-1] # remove last comma
  call_str += '>("outputTree", "outputFile.root", {"x", "y"})'
  ROOT.gInterpreter.Declare(call_str)

to be used as

>>> create_snapshot_call("int", "float") # this creates ROOT.CallSnapshot
>>> ROOT.CallSnapshot(df)

Cheers,
Enrico

Hi @eguiraud

Thanks for the response.

This certain looks like a candidate solution to what I desire.

I wonder if perhaps there is an easy way given my use case… all I am intending to do is change the type of a column from an int to a float.

I can’t define a new column that simply type casts the data from the old column as I need the column to specifically have the name that the old column has and using a name that already exists in a TTree is not allowed I believe.

Nevertheless if there is no simpler way I will try your solution.

Thanks,
Thomas

Hi Thomas,

Ah, this is embarrassing, RDataFrame is not able to do that (yet) :sweat_smile:
The Snapshot template parameters are for the type of the column in input, not for the type of the column in output (which is simply assumed to be the same as the one in input).

I don’t think there is a way to change the type (or the value) of a column without renaming it.
If you require this feature, please open a feature request at https://sft.its.cern.ch/jira/projects/ROOT.

Cheers,
Enrico

Hi Enrico,

The embarrassment is not yours, rather that TMVA Reader is not happy with integers at least in this case.

I am not sure if this is really a necessary feature of RDataFrames, it does seem to me that perhaps one should just use the correct type in the first place, or type cast whatever is being read when you need it (this is what I did in the end).

If it turns out I find a use for this feature in the future I’ll open a feature request. In terms of this thread I think your previous answer does answer the question in terms of how to take that C++ snippet and do something similar in Python (despite my misunderstanding of what the code was doing in the first place.)

Thanks,
Thomas

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.