Conditionally override an existing TDataFrame

I have written some code with the new TDataFrame but stumbled upon one irregularity. I have some filter steps which only need to be applied if running on Monte Carlo data (namely, checking for truth matching). I would like to be able to just put this in there:

ROOT::Experimental::TDataFrame df(treeName, f);
auto selected = df.Filter(...);
if (mc) {
   selected = selected.Filter("Z0_TRUEID == 23 && muplus_TRUEID == -13 && muminus_TRUEID == 13");
}
selected.Report();

where selected is the TDataFrame with the normal cuts. However, this does not compile:

In file included from input_line_11:9:
././reduce_df.cxx:64:16: error: no viable overloaded '='
      selected = selected.Filter("Z0_TRUEID == 23 && muplus_TRUEID == -13 && muminus_TRUEID == 13");
      ~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/cern/root-6.10.00/include/ROOT/TDFInterface.hxx:93:7: note: candidate function (the implicit move assignment operator) not viable: no known conversion from
      'TInterface<TFilterBase>' to 'TInterface<ROOT::Detail::TDF::TCustomColumnBase>' for 1st argument
class TInterface {
      ^
/cern/root-6.10.00/include/ROOT/TDFInterface.hxx:93:7: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from
      'TInterface<TFilterBase>' to 'const TInterface<ROOT::Detail::TDF::TCustomColumnBase>' for 1st argument
Error in <ACLiC>: Dictionary generation failed!

Is there another way to only add some filter stages sometimes but still have the same variable name afterwards? Because everything before and after this are identical for data and MC.

Hi Graipher,
thank you for stress-testing TDataFrame :smiley:

Before I attempt a solution, are the "*_TRUEID" branches only present if mc is true?

Regarding your original code, the problem lies in the fact that the types returned by Filter, Define and the other transformations are not assignable to one another. They are not even assignable to an object of their own same type. I don’t see this changing anytime soon: I would not be able to give a clear semantic meaning to the assignment of one node to another in the general case.

We should probably explicitly disallow copy-construction and copy-assignment of TDataFrames, so that users would get nicer compiler diagnostics when they try something like this – thanks for raising the issue :sweat_smile:

Hey eguiraud,
Sure thing, it will hopefully make ROOT a bit easier to use :slight_smile:.

Yes, those branches are only present if mc is true.

This was actually supposed to be a work-around for something I reported before here, that missing branches can’t be skipped/ignored but generate a compiler error (which is a good thing, if there is a way to conditionally add Filters…).

So yes, some nicer compiler output would go a long way here, I think. Thanks for the extra detail on why that is, though :slight_smile: .

Alright, so here is the simplest pattern I can think of to selectively build parts of a TDF computational graph at runtime. Someone else might come up with something better, who knows. We are still gathering experience :slight_smile:

#include "ROOT/TDataFrame.hxx"
using ROOT::Experimental::TDataFrame;

int main() {
   TDataFrame d("tree", "file.root");
  
   auto MCFilterOrNot = [&d](const bool mc) {
      if (mc)
         return d.Filter("Z0_TRUEID == 23 && muplus_TRUEID == -13 && muminus_TRUEID == 13");
      else
         return d.Filter([] { return true; });
   };

   auto selected = MCFilterOrNot(true /*or false */);
   auto c = selected.Count();

   return 0;
}

I used a lambda to take advantage of automatic return type deduction. If you have access to a c++14-compliant compiler you can instead use a free function – possibly a template free function that also takes a generic TDF object as parameter.

Let me know if this solves the issue.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.