RDataFrame accessing vector branches belonging to a split custom class

Dear ROOTers,

I am trying the RDataFrame with a complex TTree filled with custom objects (deriving from TObject), à la Event example class.

The objects have been split - or at least this is my hypothesis, I have not done this data file - in elementary and array branches. This is an extract of tree->Print():

*............................................................................*
*Br   51 :noise_pulses_in[260] : Short_t                                     *
*Entries :        5 : Total  Size=       3260 bytes  File Size  =        213 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=  12.75     *
*............................................................................*
*Br   52 :peaks     : Int_t peaks_                                           *
*Entries :        5 : Total  Size=      29788 bytes  File Size  =        121 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   53 :peaks.fUniqueID : UInt_t fUniqueID[peaks_]                         *
*Entries :        5 : Total  Size=       4524 bytes  File Size  =        146 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=  26.84     *
*............................................................................*
*Br   54 :peaks.fBits : UInt_t fBits[peaks_]                                 *
*Entries :        5 : Total  Size=       4504 bytes  File Size  =        148 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=  26.45     *
*............................................................................*
*Br   55 :peaks.area : Float_t area[peaks_]                                  *
*Entries :        5 : Total  Size=       4499 bytes  File Size  =       3914 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

RDataFrame can make plots with Histo1D of the array variable peaks.area:

gSystem->Load("event_classes.so")                               // see comments later
ROOT::RDataFrame d5("tree5", "only5events.root");
auto h1 = d5.Histo1D("peaks.area"); h1->Draw();           // works

However using the Define and Foreach features does not work:

d5.Foreach( [](std::vector<Float_t> areas) { if (areas.size()) std::cout << areas[0] << std::endl; } , {"peaks.area"} )
Error in <TTreeReaderValueBase::GetBranchDataType()>: Must use TTreeReaderArray to access a member of an object that is stored in a collection.
Error in <TTreeReaderValueBase::CreateProxy()>: The branch peaks.area contains data of type {UNDETERMINED TYPE}, which does not have a dictionary.

Please note in the first snippet that I have also loaded the dictionary of the class (I got the piece of code defining the classes and I have compiled it with gSystem->CompileMacro), but I get the same error message as when I do not load the dictionary.

Am I doing some wrong?

Thanks,
Matteo


ROOT Version: 6.14/02
Platform: Debian 9
Compiler: gcc 6.3.0-18+deb9u1


Hi Matteo,

thanks for the report. I have a few ideas but I’d like to do some tests before answering: can you share with us the file?

Cheers,
Danilo

1 Like

Dear Danilo,

Thanks for your quick reply.

I attach a 5-events-only version of the root file and the piece of code with the class definitions. This tree has been obtained with the Snapshot feature on the original tree, and this 5-events-only tree looks correct if inspected in other ways and it reproduces the problem with the Define and Foreach features of the RDataFrame

Please let me know if you need other information

classes.cpp (4.5 KB)
only5events.root (563.3 KB)

I write again here to avoid that the topic get closed… I hope that the reason for this issue can be found

Hi,
sorry for the late reply, and thanks for providing a reproducer!
The error message is not helpful at all in this case: the actual issue is that you are reading "peaks.area" as std::vector<Float_t> but that branch does not contain a std::vector, but a C array.
In general, with RDataFrame, you don’t have to care: you can read both with ROOT::VecOps::RVec<Float_t>:

df.Foreach([](ROOT::VecOps::RVec<Float_t> &areas) { if (areas.size()) std::cout << areas[0] << std::endl; } , {"peaks.area"})

prints

2.95798
2561.94
1589.36
4740.21
2551.3

With ROOT master (and soon ROOT v6.16) you can check they type of a branch with df.GetColumnType("peaks.area") (returns the string “ROOT::VecOps::RVec<Float_t>”).

Finally, in principle you should not need dictionaries for your custom class to read the split branches, as those are of fundamental types (e.g. array of floats) – in practice you might get some warnings complaining about missing dictionaries.

Let us know if you need further assistance!

Cheers,
Enrico

2 Likes

Thanks a lot.

It works!

I reopen this thread to know how I should access branches that are arrays of TString.

I noticed that also the old TTree::Scan() has problems in this case and get only the first string of the array and the only old way to have a look at the content is TTree::Show()

If I try something like

df.Foreach( [](RVecTString& det) { auto x = det[0]; } , { "peaks.detector" } );

I get the runtime error:

Error in <TTreeReaderArrayBase::SetImpl()>: Cannot read branch peaks.detector: unhandled streamer element type TStreamerString

and then a segmentation fault.

Please let me know if I am doing wrong or if array of TStrings are not (or not yet) supported and a possible workaround. I am currently using root 6.14/02 and I can install a later version if this can help.

Hi,
please open a new thread to ask a separate question :smile:
Could you provide the input file you are using (even one event should do the job) so we can check what’s going wrong? It seems ROOT is having some trouble with that data type.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.