RDataFrame and deserialization of branches

Dear ROOT experts,

I wanted to check how RDF is handling deserialization in a few different cases. The root files being dealt with are CMS NanoAOD, with default compression (i.e. LZMA 9). In the case that the dataframe is instantiated from a tchain where the files have been added (via pyroot usually, with a mix of non-pre-compiled C++ lambda/function and python-JIT nodes), can and does RDF appropriately only deserialize the branches that are used in the computation graph? Does (this) compression, or the usage of a tchain, interfere with this, or does it only work in the case that a list of branches are explicitly passed to the constructor? I’m trying to undestand all the constraints, and I haven’t necessarily run into an issue, but I don’t want to make decisions based on bad assumptions.

Cheers,
Nick


_ROOT Version:6.20.yy
_Platform:CentOS 7
_Compiler:gcc700


Yes of course!

Absolutely not, RDF is designed to only deserialize what’s needed.

Cheers,
Enrico

Great, thanks for the fast reply, Enrico!

What are the advantages of giving RDF branches to include, exclude? It would seem there’s more advantages on the front end in that case, prior to and during the buildup of the computation graph, rather than anything (significant) during event processing.

Cheers,
Nick

What are the advantages of giving RDF branches to include, exclude?

I assume you are talking about the list of “default columns” that you can pass to RDF’s constructor. They don’t do anything special, they don’t cause any particular pre-processing. Their only function is what’s described in the docs, i.e. if you don’t specify column names, RDF takes them from that list instead.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.