I need to post-process several tuples, and for that I am using RDataFrames. In the post processing I want to create new branches, remove others and merge several files. This post-process is applied to hundreds of ntuples, from which some of them are empty.
In cases where the input tree is empty, the Snapshot method will message a Warning, but it will not write the structure of the dataframe (with zero entries) to the file. This of course is inconvenient when then running several of these outputs, because I would chain all these files.
Is there a way to force the creation of the TTree in the file when calling the Snapshot method?
This as well happens when creating a RDataFrame, applying some filtering that leads to zero events, and then trying to save it. Below I leave a short script showing this.
import ROOT
# Create a simple RDF with 100 entries
n = 100
df = ROOT.RDataFrame(n)
# Define some new columns
df = df.Define("x", "rdfentry_") # just entry index
df = df.Define("y", "x * x") # square of entry index
df3 = df.Filter("x>100")
# Save the dataframe to a ROOT file
df.Snapshot("output", "myFile.root", ["x", "y"])
df3.Snapshot("output", "myFile3.root", ["x", "y"])
Is there some sort of trick to save the RDataFrame?
I don’t know if it’s possible to save the empty dataframes, but if it isn’t I would suggest a workaround: check if the dataframe is empty and if so, add just one entry with values (always the same, at least for the same variable) that you know are clearly impossible in the ‘real’ dataset, so that you can easily filter out these events later; e.g., if x and y are positive, fill the entry with -9; or you could add another column as a flag, a boolean for instance, signalling that this event should be ignored (but you’d have to add this column to all other entries in all dataframes, to mark them as ‘usable’).
I have tried your reproducer, with current master branch of ROOT. Both files are saved and I can open them again and see the columns you saved - in myFile3.root as expected, the RDF is empty but I can see that the columns “x” and “y” are there.
I suggest that you first update your ROOT version to 6.36.02 (latest stable) and check if the issue is solved. Otherwise, maybe I don’t fully understand your problem.