Dear experts,
I have a question/follow up on a similar topic as
Is it possible to disable unneeded branches when processing events with RDataFrame - similar to MakeClass/MakeSelector’s SetBranchStatus?
As an example, I have my small ntuples O(100GB) and second (huge) ntuples with (many) systematics O(2.5TB) - the same RDataFrame code (which itself does not need the systematics) runs in ~10min on the nominal sample and takes ~10h on the large sample
My use case here is not to use the large ntuples as friend trees but more to enable/disable branches on the fly/depending on my job configuration?
Hi @zhubacek ,
RDataFrame automatically disables unused branches (and also, for each entry, only reads data that’s strictly needed, so e.g. if after a strict Filter the rest of the event processing is skipped, RDF only reads the column needed by the `Filter).
So the runtime difference comes from something else. It would be useful to profile a few tens of seconds of execution of your program e.g. with perf to see where time is spent.
I could also take a look in case you can share a reproducer.
Yes, you can define newDF as a ROOT::RDF::RNode newDF(DF) and then re-assign to it whatever you need. Or you can use a helper function that takes an RDF and returns an RDF to the same effect.