RDataFrame with convoluted data structure

Hi,

I have a ROOT file that has the following structure:

******************************************************************************
*Tree    : myTree   : myTree                                                *
*Entries :  3032087 : Total =      2592538810 bytes  File  Size =  186442961 *
*        :          : Tree compression factor =  13.91                       *
******************************************************************************
*Branch  :event                                                              *
*Entries :  3032087 : BranchElement (see below)                              *
*............................................................................*
*Br    0 :det1Array[5] : Detector1                                      *
*Entries :  3032087 : Total  Size=  723710086 bytes  File Size  =   56928690 *
*Baskets :     3418 : Basket Size=   25600000 bytes  Compression=  12.71     *
*............................................................................*
*Br    1 :tdiff : Double_t                                               *
*Entries :  3032087 : Total  Size=   24269792 bytes  File Size  =    1018422 *
*Baskets :      129 : Basket Size=    1557504 bytes  Compression=  23.83     *
*............................................................................*
*Br    2 :xavg  : Double_t                                               *
*Entries :  3032087 : Total  Size=   24269659 bytes  File Size  =    1186724 *
*Baskets :      129 : Basket Size=    1557504 bytes  Compression=  20.45     *
*............................................................................*
*Br    3 :det1Channel[5] : Double_t                                     *
*Entries :  3032087 : Total  Size=  133491282 bytes  File Size  =    9471640 *
*Baskets :      676 : Basket Size=   25600000 bytes  Compression=  14.09     *
*............................................................................*

which is the result of the following data struct

struct DetectorHit 
{
  double Long=-1, Short=-1, Time=-1;
  int Ch=-1;
};

struct Detector1 
{
  std::vector<DetectorHit> rings;
  std::vector<DetectorHit> wedges;
};

struct ProcessedEvent 
{
  Detector1 det1Array[5];
  double tdiff;
  double xavg;
  double det1Channel[5];
};

Using ROOT::TTree, I can access the branch element det1Channel as follow:

TFile *f("myfile.root");
TTree *t = new static_cast<TTree*>(f->Get("myTree"));
ProcessedEvent *pevent = new ProcessedEvent;
t->SetBranchAdrress("event",&pevent);
t->GetEntry(100); // get the 100th entry
int detectorID = 0;
int currentChannel = pevent->det1Channel[detectorID];

where detectorID can range from 0 to 4.

In RDataFrame, I try to achieve the same thing by doing the following:

ROOT::RDataFrame df("myTree", "myfile.root");
auto column_name = "det1Channel[5]";
auto first_rows = df.Take<double>(column_name);
for (std::size_t i = 0; i < first_rows->size(); ++i) {
	std::cout << "Row " << i << " of column " << column_name << ": " << first_rows->at(i) << std::endl;
}

This seems to be the only way I can extract that specific column. However, with this, I have lost the ability to choose a specific value for detectorID. I have tried to use "det1Channel" instead of "det1Channel[5]" as the column name but that wouldn’t work. The error states that the former name is not recognized. What can I do to fix this behavior?

Hi @Pete ,

df.GetColumnNames() should return a list of column names that RDF recognizes for the given input dataset. How do the contents of that vector look like?

In general to extract a single value of an array you would do:

df.Define("first", [](RVecD &vec) { return vec[0]; }, {"vec_column"})
  .Take<double>("first");

or

df.Define("first", "vec_column[0]").Take("first");

As an aside note that usually the preferred workflow with RDF is not to extract column values and then operate on them, but to operate on values using RDF itself (with Define, Filter, Histo1D, etc.) – although of course there are legitimate use cases for Take.

Cheers,
Enrico

Actually looking at the TTree::Print output more closely I think the name of the branch is literally det1Channel[5], with the [5] included? (you can verify that via a tree->GetBranch(..)->GetName()).

If that’s true, that’s a weird way to write a tree, that’s what’s confusing RDF.

You can probably still go through the event parent branch, but that will force RDF to deserialize all of ProcessedEvent for each entry rather than just det1Channel. It also requires dictionaries for ProcessedEvent to be available, otherwise ROOT I/O won’t know how to deserialize the custom type:

df.Define("my_value", [](ProcessedEvent &event) { return event.det1Channel[0]; }, {"event"})
  .Take<double>("my_value");

or

df.Define("my_value", "event.det1Channel[0]").Take("my_value");

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.