On-fly dictionary generation for vector<myclass> and use as a branch

Again just to clarify: TTree will never be removed and will not become unsupported any time soon, probably ever given the amount of code that relies on it. RNTuple aims to be in production in a few years and it is a modern TTree substitute that will provide important benefits w.r.t. TTree (mostly in terms of performance, storage use and type-safety) but it will not replace all of its features. Also it won’t be backward-compatible, meaning TTree will (have to) stay in ROOT to serve all users that are reading files written before, say, Run 4, and that rely on its features.

I had looked into factoring out TTreeFormula to allow something like df.Filter(FromFormula("yourttreeformulaexpression")), but unfortunately the TTreeFormula parser is too entangled in TTree internals to be factored out that way, so RDataFrame supports all of C++ but has no way to understand TTree::Draw expressions.

Bottom line: there are things in RDataFrame that require extra typing w.r.t. TTree::Draw. There are also several things that are possible in RDataFrame that are not possible at all with TTree::Draw, e.g. writing out new ROOT files and producing multiple histograms with a single (multi-thread) event loop, or calling arbitrary C++ functions during the event loop.

In case you find that certain useful features are completely missing in RDataFrame, please ask for them at Issues · root-project/root · GitHub . For some there might be alternative ways to get the same result in RDF (possibly with a bit more verbosity than in TTree::Draw, admittedly :slight_smile: ), some might be things that we do need to implement.

Cheers,
Enrico

1 Like

@pcanal Is there any workaround for the bug in TTreeFormula? Also, would you generate a bug ticket? I could do it, but I am not sure I ever did…

Just to add to this thread, the optimal way to store multi-dimensional (jagged) arrays in RNTuple is via std::vector<MyClass>, where MyClass itself can contain (nested) std::vectors. As in TTree, a dictionary of MyClass is required for writing, although not necessarily for reading. In RNTuple, there is no overhead from using std::vector for the serialization of collections and the data gets fully split to columnar layout throughout all nesting levels.

There are some tutorials available that show the basic RNTuple functionality. There is also RDataFrame support for reading (the pending PR #6700 will bring a big improvement). If you like to give your sample a try with RNTuple, please don’t hesitate to get in touch with me if you have any questions.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.