I can see that in C++ ROOT one can specify the type different columns should have when writing them to disk with Snapshot, what is the way to do this in pyroot?
e.g. in C++
// specifying template parameters ("x" is `int`, "y" is `float`)
df.Snapshot<int, float>("outputTree", "outputFile.root", {"x", "y"});
Hi, @etejedor can comment with more authority on the subject, but I don’t think that current PyROOT supports calling C++ template methods. So there is no straightforward way to do it.
Depending on why you need to specify the template parameters, it might or might not be worth it to create a helper function that does what you need.
Something like (haven’t tested this code):
def create_snapshot_call(*types: str):
call_str = "void CallSnapshot(ROOT::RDF::RNode df) { df.Snapshot<"
# add template arguments to call_str
for t in types:
call_str += f"{t},"
call_str = call_str[:-1] # remove last comma
call_str += '>("outputTree", "outputFile.root", {"x", "y"})'
ROOT.gInterpreter.Declare(call_str)
to be used as
>>> create_snapshot_call("int", "float") # this creates ROOT.CallSnapshot
>>> ROOT.CallSnapshot(df)
This certain looks like a candidate solution to what I desire.
I wonder if perhaps there is an easy way given my use case… all I am intending to do is change the type of a column from an int to a float.
I can’t define a new column that simply type casts the data from the old column as I need the column to specifically have the name that the old column has and using a name that already exists in a TTree is not allowed I believe.
Nevertheless if there is no simpler way I will try your solution.
Ah, this is embarrassing, RDataFrame is not able to do that (yet)
The Snapshot template parameters are for the type of the column in input, not for the type of the column in output (which is simply assumed to be the same as the one in input).
I don’t think there is a way to change the type (or the value) of a column without renaming it.
If you require this feature, please open a feature request at Loading....
The embarrassment is not yours, rather that TMVA Reader is not happy with integers at least in this case.
I am not sure if this is really a necessary feature of RDataFrames, it does seem to me that perhaps one should just use the correct type in the first place, or type cast whatever is being read when you need it (this is what I did in the end).
If it turns out I find a use for this feature in the future I’ll open a feature request. In terms of this thread I think your previous answer does answer the question in terms of how to take that C++ snippet and do something similar in Python (despite my misunderstanding of what the code was doing in the first place.)