RDataFrame and deserialization of branches

nmangane · December 12, 2020, 2:02pm

Dear ROOT experts,

I wanted to check how RDF is handling deserialization in a few different cases. The root files being dealt with are CMS NanoAOD, with default compression (i.e. LZMA 9). In the case that the dataframe is instantiated from a tchain where the files have been added (via pyroot usually, with a mix of non-pre-compiled C++ lambda/function and python-JIT nodes), can and does RDF appropriately only deserialize the branches that are used in the computation graph? Does (this) compression, or the usage of a tchain, interfere with this, or does it only work in the case that a list of branches are explicitly passed to the constructor? I’m trying to undestand all the constraints, and I haven’t necessarily run into an issue, but I don’t want to make decisions based on bad assumptions.

Cheers,
Nick

_ROOT Version:6.20.yy
_Platform:CentOS 7
_Compiler:gcc700

eguiraud · December 12, 2020, 7:03pm

Yes of course!

Absolutely not, RDF is designed to only deserialize what’s needed.

Cheers,
Enrico

nmangane · December 13, 2020, 9:37am

Great, thanks for the fast reply, Enrico!

What are the advantages of giving RDF branches to include, exclude? It would seem there’s more advantages on the front end in that case, prior to and during the buildup of the computation graph, rather than anything (significant) during event processing.

Cheers,
Nick

eguiraud · December 13, 2020, 10:35am

What are the advantages of giving RDF branches to include, exclude?

I assume you are talking about the list of “default columns” that you can pass to RDF’s constructor. They don’t do anything special, they don’t cause any particular pre-processing. Their only function is what’s described in the docs, i.e. if you don’t specify column names, RDF takes them from that list instead.

system · December 27, 2020, 10:35am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.