Write TTree with branches containing multiple leaves of different types

Dear ROOT community,

I’m trying to write different kind of data to an output TTree, for which I’d like to separate in branches containing different leaf types:

Observables.Emu
Observables.Pmu
...
Event.Weight
Event.BinIndex

As far as I understood, it is possible to do such a thing using the TTree method:

TBranch* TTree::Branch(const char* name, void* address, const char* leaflist, Int_t bufsize /* = 32000 */)

And call this with:

tree->Branch("Observables", addressOfDataArray, "Emu/D:Pmu/D:Leaf3/D....");

My question is which kind of object can I use for addressOfDataArray to manage and provide the data I want to write?

It seems like this TTree::Branch calls the constructor of TBranch, which itself parse the leaf string (i.e. "Emu/D:Pmu/D:Leaf3/D....") and define individual leaf addresses with leaf->SetAddress((char*) (fAddress + offset));.

I wonder which (ROOT) wrapper class we are supposed to provide as addressOfDataArray.

Cheers! :slight_smile:
Adrien


ROOT Version: 6.24/06
Platform: macOS Montery
Compiler: clang-1300.0.29.3


Assuming that your “DataArray” is a “struct” or a class object:
tree->Branch("Observables.", addressOfDataArray); // the "." in the end is intentional

{
  struct DataArray { double Emu, Pmu, Leaf3; }; DataArray MyDataArray;
  TTree *tree = new TTree("tree", "my tree");
  tree->Branch("Observables.", &MyDataArray);
  tree->Print();
  delete tree;
}

Thanks a lot for your answer!! :stuck_out_tongue:

I see, so in this case I should know in advance the data I want to put in. This would fit the second part of my problem (the Event.* for which all variables are actually members of a class).

@pcanal The other part however (with Observables), I’m storing the data in a vector of std::any type. So I was wondering if I could do something like:

f = TFile::Open("fTest.root", "RECREATE");
std::vector<std::any> va;
va.emplace_back(double(1.5));
va.emplace_back(int(4));
TTree* t = new TTree();
t->Branch("test", &va[0], "dTest/D:iTest/I");
va[0] = double(48);
va[1] = int(-1);
t->Fill();
t->Write();
f->Close();

But a version which works…!
(this piece of code don’t save the right values: when reading the TTree I get 0 value for both)

It’s strange, your solution seems to work with CINT but in compiled program the structure isn’t recognised :thinking:

Here is a piece of code:

PhysicsEventMembers p;
tree->Branch("Event.", &p);
tree->Print("ALL");

And it prints:

******************************************************************************
*Tree    :MC_TTree  : MC_TTree                                               *
*Entries :        0 : Total =             283 bytes  File  Size =          0 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************

As with CINT I get:

******************************************************************************
*Tree    :          :                                                        *
*Entries :        0 : Total =            4587 bytes  File  Size =          0 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Branch  :Event.                                                             *
*Entries :        0 : BranchElement (see below)                              *
*............................................................................*
*Br    0 :Event._dataSetIndex_ : Int_t                                       *
*Entries :        0 : Total  Size=        549 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :Event._entryIndex_ : Long64_t                                      *
*Entries :        0 : Total  Size=        541 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    2 :Event._treeWeight_ : Double_t                                      *
*Entries :        0 : Total  Size=        541 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    3 :Event._nominalWeight_ : Double_t                                   *
*Entries :        0 : Total  Size=        553 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    4 :Event._eventWeight_ : Double_t                                     *
*Entries :        0 : Total  Size=        545 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    5 :Event._sampleBinIndex_ : Int_t                                     *
*Entries :        0 : Total  Size=        557 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

Should I tell ROOT something about the struct?

For reference, this requires the memory start at &val[0] to contain exactly a double followed by an int (i.e. in the first 12 bytes). Most likely than not, the implementation of std::any also use the memory to record the data type: it won’t work :slight_smile:

We currently do not support directly std::any (and collection thereof), however the following should work:

t->Branch("dTest", std::static_cast<double*>( &(val[0]) );
t->Branch("iTest", std::static_cast<int*>( &(val[1]) );

if and only there is no resizing of the vector (if there is you need to call SetBranchAddress to inform the branches that the address of the data has changed).

Yes. You need to generate a dictionary (via LinkDef file and an execution of rootcling). If you pass the declaration to Cling then it does indeed mostly work since ROOT then know the layout of the struct.

Thanks a lot @pcanal

It’s true I can always cast the pointer as the type I’m expected but is there a way I could store those variable in subbranches, like Observable.dTest?

With the current code of ROOT, the short answer is no (Support for std::any has not been implemented). However, you could :slight_smile: try to (with our help :)) add this support.

Ok, I’ll try to find a solution and come back here :slight_smile:

@pcanal There you go:

I made a class which holds a std::vector<char> in charge of keeping the continuity of the data. With the use of templates and memcopy it was quite straight forward in the end. :smiley:

Here is the class header:

class RawDataArray{

  public:
    inline RawDataArray() = default;
    inline virtual ~RawDataArray() = default;

    inline void reset();

    inline std::vector<char>& getRawDataArray();

    template<typename T> inline void writeRawData(const T& data); // auto incrementing "_currentOffset_"
    template<typename T> inline void writeRawData(const T& data, size_t byteOffset_);

    void resetCurrentByteOffset();
    void lockArraySize();
    void unlockArraySize();

  private:
    bool _lockArraySize_{false};
    size_t _currentByteOffset_{0};
    std::vector<char> rawData{};

  };

( link for the file: GenericToolbox.RawDataArray.h )

And the implementation (for new comers: it’s part of my header-only library, so template function should be defined in the previous header file instead of the .cpp):

inline void RawDataArray::reset(){
    rawData = std::vector<char>();
    resetCurrentByteOffset();
    unlockArraySize();
  }

  inline std::vector<char>& RawDataArray::getRawDataArray(){
    return rawData;
  }

  template<typename T> inline void RawDataArray::writeRawData(const T& data){
    this->writeRawData(data, _currentByteOffset_);
    _currentByteOffset_+=sizeof(data);
  }
  template<typename T> inline void RawDataArray::writeRawData(const T& data, size_t byteOffset_){
    if(rawData.size() < byteOffset_ + sizeof(data) ){
      if( _lockArraySize_ ) throw std::runtime_error("Can't resize raw array since _lockArraySize_ is true.");
      rawData.resize(byteOffset_ + sizeof(data));
    }
    memcpy(&rawData[byteOffset_], &data, sizeof(data));
  }

  inline void RawDataArray::resetCurrentByteOffset(){
    _currentByteOffset_=0;
  }
  inline void RawDataArray::lockArraySize(){
    _lockArraySize_=true;
  }
  inline void RawDataArray::unlockArraySize(){
    _lockArraySize_=false;
  }

( link for the file: GenericToolbox.RawDataArray.impl.h )

Now let’s have a look with an example:

#include "GenericToolbox.RawDataArray.h"

f = TFile::Open("fTest.root", "RECREATE");

GenericToolbox::RawDataArray ar;

double d1{0};
int i1{-1};

ar.writeRawData(d1);
ar.writeRawData(i1);

ar.lockArraySize(); // make sure the 

TTree* t = new TTree("t", "t");
t->Branch("test", &ar.getRawDataArray()[0], "d1/D:i1/I");

ar.resetCurrentByteOffset();
ar.writeRawData(double(1.2));
ar.writeRawData(int(0));
t->Fill();

ar.resetCurrentByteOffset();
ar.writeRawData(double(1.8));
ar.writeRawData(int(1));
t->Fill();

t->Write();
f->Close();

Then we get what I wanted :slight_smile: :

It’s a solution for me, but I agree it would be great to have this kind of data wrapper in ROOT!

Thanks a lot for the breakdown guys :smiley:
Cheers!

Note that because of the line:

You can only read the data on a platform that are the same endianess (and the same padding rules if the type is a struct) as the machine on which the file was written, and is limited to simple type (numerical type and struct thereof) (no pointers, and thus no STL collection, and no struct/class with a virtual table)

And for your use case (especially create the branch using a leaflist which by construction has similar restriction), it ought to be fine.

Thanks for your answer @pcanal !

I see :thinking:

For the endianness of the data, I would have thought it does not matter since this memcpy is called on the same machine that defined the TLeaf(). Once TTree::Fill() is called, I am guessing the memory is now handled by ROOT which takes care of the endianness. But to be honest I must admit I’m no expert in this. :sweat_smile: What issue in particular would you expect?

It’s true only the simple types are supported with this method, but don’t you think it would be possible to extend this to more complicated TObjects (like TGraph, TSpline3 or TClonesArray)? I have no idea how you managed to do that within ROOT already!

Thanks again!
Cheers

Actually, you are right since you told the TLeaf what the data types are, it will be properly handled.

but don’t you think it would be possible to extend this to more complicated TObjects (like TGraph, TSpline3 or TClonesArray)?

We would have to do something similar to the static_cast I mentioned above. Ideally, we would automate this and have the system auto-discovered the data type (which the std::any knows).

However, in order to split the data in columns (similarly to what you did), you need to guaranteed that the contain of the data is always the same types in the same order.

Thanks @pcanal :slight_smile:

I did an attempt to automate the writing of a std::any. However I’m not using the STL class because I originally wanted my toolbox to be compatible with C++11 standard (which does not includes std::any yet).

My reimplementation is quite similar, but I had to write additionnal methods to get extra info that std::any don’t provide. For example, std::any don’t keep the size of the stored object. The only hair we can rely on to figure out the original data type is to use the .type() method. But this is not sufficient to automatically know the actual size of the stored object. There was some discussion on Stack Overflow about this.

But I guess it’s ok in our case to make a else if chain as the data types one can store in TTrees is finite, and we need to figure out the type of the object for the leaf definition string anyway:

size_t getLeafSize(std::any& a_){
  if     ( a_.type() == typeid(int) ){ return sizeof(int); }
  else if( a_.type() == typeid(double) ){ return sizeof(double); }
  //...
}

The other concern about the use of std::any is to find the proper address in memory of the stored object. As memcpy write the raw data from the address point we provide, it’s necessary to at least know the offset:

root [0] std::any a
(std::any &) @0x1095df0f0
root [1] a = double(1.5)
(std::any &) @0x1095df0f0
root [2] &a
(std::any *) 0x1095df0f0
root [3] double *d = std::any_cast<double>(&a)
(double *) 0x1095df0f8
root [4] sizeof(a)
(unsigned long) 32
root [5] sizeof(*d)
(unsigned long) 8

In that case the overhead size of std::any seems to be 8 ( = 0x1095df0f8 - 0x1095df0f0 ). So if the offset is always the same, we can easily write the data from this starting point.

But for more complicated objects (like TGraph, TSpline3 or TClonesArray), I understand your concern is a more complicated question. In practice how do you write those objects in a TBranch? If I remember correctly a typical instruction for the user side is:

TTree* t = new TTree();
TSpline3 s;
t->Branch("mySpline", &s);
t->Print();
******************************************************************************
*Tree    :          :                                                        *
*Entries :        0 : Total =             842 bytes  File  Size =          0 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Br    0 :mySpline  : TSpline3                                               *
*Entries :        0 : Total  Size=        501 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

Does it mean each time I call TTree::Fill(), the size of the defined TLeaf is reset?

Thanks again for this very interesting discussion :slight_smile:
Cheers!

The only hair we can rely on to figure out the original data type is to use the .type() method. But this is not sufficient to automatically know the actual size of the stored object.

However ROOT has the information you need:

static TDictionary *TDictionary::GetDictionary(const std::type_info &typeinfo);

will return either a TClass* or a TDataType* (for numerical types) which both can return the sizeof.

That is probably sizeof(std::type_info).

What do you mean by ‘size’ here?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.