RDataFrame functionality missing

I wanted to encourage someone to try RDataFrame but I found it dies like crazy on simple trees (ROOT 6.28/02).
These trees contain branches that are ordinary variables, simple arrays (fixed and variable size), and “split” and “unsplit” objects of some user’s classes.
Using TFile::MakeProject, I generate and load dictionaries for all “nonstandard” classes.
I then try to dump the available column names and types … that’s where RDataFrame breaks.

One tree:

what(): TTree leaf V.V has both a leaf count and a static length. This is not supported.

TTree::Print gives:

*Br    1 :N         : N/I
*Br    2 :V         : V[N][3]/F

TTree:MakeSelector returns (for all such branches):

Warning in <TTreeReaderGenerator::AddReader>: Ignored branch SOME_BRANCH because type is unsupported.

Another tree (for the “V” column, RDataFrame returns “TClonesArray” as the column type):

what(): TTree leaf V.X[4] has both a leaf count and a static length. This is not supported.

TTree::Print gives:

*Br    1 :V         : Int_t V_
*Br    2 :V.fUniqueID : UInt_t fUniqueID[V_]
*Br    3 :V.fBits   : UInt_t fBits[V_]
*Br    4 :V.X[4]    : Double_t X[V_]

Well, I think it would also die on another column (if it hasn’t died on the previous one):

*Br    5 :V.Y[3][4] : Float_t Y[V_]

TTree:MakeSelector returns (for all such branches):

Error in <AnalyzeBranch>: Arrays inside collections are not supported yet (branch: SOME_BRANCH).

@eguiraud Can you take a look?


this is a limitation of TTreeReader, it does not support 2D arrays. As RDataFrame uses TTreeReader under the hood for I/O, it inherits this limitation. RDataFrame does not support multi-dimensional arrays.

See e.g. ROOT data frame - unsupported leaf value . The relevant JIRA ticket is [ROOT-9509] [DF] Add proper support for multidimensional arrays - SFTJIRA . CC: @Axel .


Ah, yes. I’ve now even found my older thread (related to another set of trees):

So, the bad news is that RDataFrame is useless.
The good news is that it’s also useless in multi-threaded mode on a single machine and on Spark, Dask, and other distributed modes.

I really miss Rene here.

That’s quite a misrepresentation of reality :slight_smile: but hey, I guess for the multi-dim case, until we (I?) finally fix this, people have to resort to using TTree directly.

How so? We see accelerations of more than an order of magnitude in real life analyses, and physicists being very happy that they don’t need to debug those void*& arguments of SetBranchAddress() anymore. Scaling across clusters is excellent, too. All of that assumes we’re talking about more than a minute of work.

Well, the “TTreeReader” was introduced something like 10 years ago. It still does NOT support simple ordinary columns (“branches”) holding fundamental types and arrays thereof (don’t tell me to find when the old ROOT 5 started to support them, probably something like 30 years ago).

Each time I tried the highly complicated RDataFrame, I could get nothing because I could not read some important data from the existing trees.

So, its performance is exactly 0 (zero) for me. Even if I multiply this performance by any number of available cores (multi-threaded) and by any number of available machines (distributed modes), I still get a very round 0 (zero).

If I need “multi-threading”, … the "=legacy" Selector plus the “PROOF Lite” (I can’t find the corresponding web page now) is a solution.
And for the “distributed modes”, I need to go with any available ordinary “batch clusters” systems (plus the “Ganga” if desired).

I fully understand that you have a case we don’t support (yet). I’d argue that extrapolation to all the physicists of the world isn’t a given.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.