Cannot read integral-valued branches from TTree with PyROOT (originally written as custom object data members)

Hello,

I’m getting a weird issue when trying to read out values from a TTree branch stored in a ROOT file. I originally persisted the following class to file (12x to the same TTree):

// `Results` has no data members that are written to file
class MyResults : public Results {
 public:
  std::vector<int> activity;
  std::vector<int> occupancy;
  int timesteps;
  double foo;
};

I write this object to file as such:

TFile tfile("results.root", "UPDATE");

// If ROOT file existed, get the TTree object
TTree *tree = static_cast<TTree *>(tfile.Get("binding_cells"));

// If ROOT file didn't exist before, our TTree* is a nullptr
if (tree == nullptr) {
  // Create new TTree and pass the name and description to it
  tree = new TTree("binding_cells", "T-Cell_Activity_Study");
  // Write the results to file under the specified branch name
  // (I get the object in as a function argument)
  tree->Branch("binding_cells", obj);
} else {  // If ROOT file did exist, we should append to existing TTree
  // Make a local copy of the object pointer
  auto *obj_ptr = obj;
  // Append to branch
  tree->SetBranchAddress("binding_cells", &obj_ptr);
}

tree->Fill();
tfile.Write();

In PyROOT I do:

// Load the dictionary of MyResults and Results
ROOT.gSystem.Load("results_dict.so");

f = TFile("results.root");
results = gROOT.FindObject("binding_cells")
results.Print()
******************************************************************************
*Tree    :binding_cells: T-Cell_Activity_Study                                  *
*Entries :       12 : Total =           38757 bytes  File  Size =       7747 *
*        :          : Tree compression factor =   5.28                       *
******************************************************************************
*Branch  :binding_cells                                                      *
*Entries :       12 : BranchElement (see below)                              *
*............................................................................*
*Br    0 :bdm::Results :                                                     *
*Entries :       12 : Total  Size=       1748 bytes  File Size  =       1272 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :activity  : vector<int>                                            *
*Entries :       12 : Total  Size=      16233 bytes  File Size  =       1608 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   9.75     *
*............................................................................*
*Br    2 :occupancy : vector<int>                                            *
*Entries :       12 : Total  Size=      16249 bytes  File Size  =       1584 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   9.90     *
*............................................................................*
*Br    3 :timesteps : Int_t                                                  *
*Entries :       12 : Total  Size=       1633 bytes  File Size  =       1068 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    4 :foo       : Double_t                                               *
*Entries :       12 : Total  Size=       1585 bytes  File Size  =       1044 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

When I do

for event in results:
    # same with event.timesteps
    print(event.foo)
<ROOT.bdm::MyResults object at 0x7f4b43ec8bd0>
<ROOT.bdm::MyResults object at 0x7f4b43ec8bd0>
<ROOT.bdm::MyResults object at 0x7f4b43ec8bd0>
...(12x in total)...

However doing print(event.activity) or print(event.occupancy), I do get the values shown. It seems like only for integral values types that I get the address to the object back.

I also tried the same steps with C++ notebooks, using TTreeReaderValue<vector<int>> and TTreeReaderValue<int>, and there it seems to read out all the branches correctly.

Am I doing something wrong, or is this supposed to work?

Cheers,
Ahmad


ROOT Version: 6.18
Platform: CentOS 7
Compiler: GCC 7.3.0


Hi,
Can you share with me that ROOT file you generate so I can have a look?
Thanks,
Enric

I have attached the file below.

new_results.root (6.6 KB)

Hi,
I have errors when reading the file, both in Python and C++. Could you attach here the code you use (together with the necessary class definitions) to generate new_results.root? I will run it myself to try to find the problem.

I made a simplified reproducer of the problem:

I run this in the cling interpreter:

#include "test.h"
.L test.h+
MyResults *res = new MyResults()
res->timesteps = 42
TFile f("test.root", "recreate")
TTree tree("binding_cells", "brief")
tree.Branch("binding_cells", res)
tree.Fill()
f.Write()

And I run the commands in test.py in the notebook to get the output I mentioned in the beginning.

When I made the reproducer I noticed that having a ClassDef in the Results class, gave me the <ROOT.bdm::MyResults object at 0x7f4b43ec8bd0> message. Not having a ClassDef just simply prints out nullptr.

test.h (363 Bytes)
test.py (324 Bytes)

Hi,

You should be able to read the file with this code (works for me):

import ROOT
#ROOT.gInterpreter.Declare('#include "test.h"')
ROOT.gSystem.Load("test_h.so")
f = ROOT.TFile("test.root")
t = f.binding_cells
for event in t: print(event.binding_cells)
for event in t: print(event.binding_cells.foo)
for event in t: print(event.binding_cells.timesteps)

Notice that there is only one top level branch (binding_cells) and you access the sub-branches via that branch in PyROOT.

1 Like

Hi,

Sorry for the late response. I was away for a while.

Thanks for the explanation. I mistakenly thought that gROOT.FindObject("binding_cells") would return the branch binding_cells, because I called both the branch and the tree the same… I got confused that I ‘miraculously’ could access the vector objects from the branch directly (still don’t get how that is possible).

One more general question I have is whether or not it is good practice to have one branch in a tree with multiple sub-branches. Wouldn’t it be better to remove the top branch and have multiple branches instead? I get the current construction because I write the MyResults object to file, so it’s done automatically. Is there a way to flatten the sub-branches into branches?

Hi,

One more general question I have is whether or not it is good practice to have one branch in a tree with multiple sub-branches. Wouldn’t it be better to remove the top branch and have multiple branches instead? I get the current construction because I write the MyResults object to file, so it’s done automatically. Is there a way to flatten the sub-branches into branches?

When you store a branch (i.e. you invoke tree.Branch) you can indicate the split level of that branch. This will determine if it will be stored as a whole or it will be automatically split into sub-branches. Splitting or not splitting depends on what you want to do: if you are sure you will always need to read the whole branch, no need to split. If on the contrary you will likely be accessing sub-branches and you only want to read those, then split. More info here:

https://root.cern.ch/doc/master/classTTree.html
https://root.cern.ch/root/htmldoc/guides/users-guide/Trees.html

Note that what I explained above does not apply if you use the Python syntax to access the branches (e.g. event.binding_cells.foo). This is nice syntax-wise, but it also reads the whole branch. If you wanted to benefit from the optimization, you would need to do (from Python) SetBranchAddress + GetEntry. Alternatively, you could also use RDataFrame to efficiently read tree data from Python:

https://root.cern/doc/master/classROOT_1_1RDataFrame.html

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.