Automatically add missing column to RDataFrame

Is there a way to automatically add/define any missing columns in RDataFrame? We’re analyzing data with different detectors/columns in the input tree for different runs and would like to be able to use one single helper to analyze all of it instead of having to write multiple variants of it for the different combinations of columns present in the input tree.

In TProof we’ve been using something like

if(tree->SetBranchAddress("TMyClass", &myClass) == Three::kMissingBranch) {
   myClass = new TMyClass;
}

for every detector/column used in the selector. That way if a detector was not present in the input tree, a new, empty detector would be used (which means no histograms involving this detector were filled).

I’m not sure how to do this for helpers used in RDataFrame. I can try and book the helper and catch the exception to determine which columns might be missing, but I don’t know how to create a new column with an empty detector-object. Using Class::GetClass("TMyClass")->New() in a lambda to define a new column does not work as this creates a void pointer (which can’t be dereferenced) instead of an object. Nor can I use something like []() { return *static_cast<TMyBase>(TClass::GetClass("TMyClass")->New()); }) to define a new column as I then get a mismatch between it being used as TMyClass but advertised by the Define call as TMyBase.

Is there a way to do this in RDataFrame?

Hi @vaubee ,

would what is described at [DF] Add support for 'missing' columns · Issue #8704 · root-project/root · GitHub (feature request still to be implemented) work for you?

At the moment the workaround is to build different RDF objects for the different variants of the dataset and, for RDFs where a certain column is missing, Define that column.
You must have a reasonable default value for the missing column of course – e.g. your TMyClass must have a default constructor.

Alternatively you can pre-process the different datasets in order to normalize their schema to something uniform.

Cheers,
Enrico

Hi Enrico,

yes, that feature would work for us. I guess we’ll have to wait until that is implemented. I’ll try and see if I can come up with a way to automatically add the missing column that doesn’t involve just a a huge if-else switch.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.