Disabling branches in RDataFrame?

Dear experts,
I have a question/follow up on a similar topic as

Is it possible to disable unneeded branches when processing events with RDataFrame - similar to MakeClass/MakeSelector’s SetBranchStatus?
As an example, I have my small ntuples O(100GB) and second (huge) ntuples with (many) systematics O(2.5TB) - the same RDataFrame code (which itself does not need the systematics) runs in ~10min on the nominal sample and takes ~10h on the large sample

My use case here is not to use the large ntuples as friend trees but more to enable/disable branches on the fly/depending on my job configuration?

Thanks for ideas and comments

Hi @zhubacek ,
RDataFrame automatically disables unused branches (and also, for each entry, only reads data that’s strictly needed, so e.g. if after a strict Filter the rest of the event processing is skipped, RDF only reads the column needed by the `Filter).

So the runtime difference comes from something else. It would be useful to profile a few tens of seconds of execution of your program e.g. with perf to see where time is spent.
I could also take a look in case you can share a reproducer.

Cheers,
Enrico

Hi @eguiraud ,
Thanks for the suggestion! I think I see my problem now - with perf record -F 99 ...
it looks

 40.44%  rdf_dijet_reade  libz.so.1.2.11       [.] inflate_fast                                                                                                         ◆
  11.89%  rdf_dijet_reade  libz.so.1.2.11       [.] adler32_z                                                                                                            ▒
   5.72%  rdf_dijet_reade  libRIO.so            [.] TStreamerInfoActions::GenericLooper::ReadBasicType<double>                                                           ▒
   5.34%  rdf_dijet_reade  libCore.so           [.] TExMap::GetValue                       

that the code tries to read the branches when I thought it doesn’t need them

newDF = DF.Filter(MyFilter(doSyst),{"NominalBranch","SystBranch"});

I think I will need to modify it to something like

if (doSyst) newDF = DF.Filter(MyFilter1,{"NominalBranch"});
else newDF = DF.Filter(MyFilter2,{"NominalBranch","SystBranch"});

I hope this is ok when it will be evaluated at run time?

Good!

Yes, you can define newDF as a ROOT::RDF::RNode newDF(DF) and then re-assign to it whatever you need. Or you can use a helper function that takes an RDF and returns an RDF to the same effect.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.