Disabling branches in RDataFrame?

zhubacek · June 21, 2021, 7:36am

Dear experts,
I have a question/follow up on a similar topic as

Is it possible to disable unneeded branches when processing events with RDataFrame - similar to MakeClass/MakeSelector’s SetBranchStatus?
As an example, I have my small ntuples O(100GB) and second (huge) ntuples with (many) systematics O(2.5TB) - the same RDataFrame code (which itself does not need the systematics) runs in ~10min on the nominal sample and takes ~10h on the large sample

My use case here is not to use the large ntuples as friend trees but more to enable/disable branches on the fly/depending on my job configuration?

Thanks for ideas and comments

eguiraud · June 21, 2021, 7:43am

Hi @zhubacek ,
RDataFrame automatically disables unused branches (and also, for each entry, only reads data that’s strictly needed, so e.g. if after a strict Filter the rest of the event processing is skipped, RDF only reads the column needed by the `Filter).

So the runtime difference comes from something else. It would be useful to profile a few tens of seconds of execution of your program e.g. with perf to see where time is spent.
I could also take a look in case you can share a reproducer.

Cheers,
Enrico

zhubacek · June 21, 2021, 8:13am

Hi @eguiraud ,
Thanks for the suggestion! I think I see my problem now - with perf record -F 99 ...
it looks

 40.44%  rdf_dijet_reade  libz.so.1.2.11       [.] inflate_fast                                                                                                         ◆
  11.89%  rdf_dijet_reade  libz.so.1.2.11       [.] adler32_z                                                                                                            ▒
   5.72%  rdf_dijet_reade  libRIO.so            [.] TStreamerInfoActions::GenericLooper::ReadBasicType<double>                                                           ▒
   5.34%  rdf_dijet_reade  libCore.so           [.] TExMap::GetValue

that the code tries to read the branches when I thought it doesn’t need them

newDF = DF.Filter(MyFilter(doSyst),{"NominalBranch","SystBranch"});

I think I will need to modify it to something like

if (doSyst) newDF = DF.Filter(MyFilter1,{"NominalBranch"});
else newDF = DF.Filter(MyFilter2,{"NominalBranch","SystBranch"});

I hope this is ok when it will be evaluated at run time?

eguiraud · June 21, 2021, 8:32am

Good!

Yes, you can define newDF as a ROOT::RDF::RNode newDF(DF) and then re-assign to it whatever you need. Or you can use a helper function that takes an RDF and returns an RDF to the same effect.

Cheers,
Enrico

system · July 5, 2021, 8:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.