Array of strings as a leaf in a branch of TTree

LeWhoo · December 24, 2020, 4:26pm

Dear ROOTers,

I would like to add a leaf that contains an array of strings to a branch that already contains other leaves. Is it possible?

I understand that I can’t use the normal “multileaf” TBranch constructor, for something like “mystring[50]/C” does not work in TTrees, and I have to use std::vectorstd:string. However, as the vector is not a simple type, I am not sure how to add it to a branch along with other leaves. Perhaps with the TBranch TCollection constructor?

bellenot · December 24, 2020, 11:13pm

@pcanal will most probably help you once back from vacation

pcanal · January 7, 2021, 11:12pm

Humm … why would it not work for TTree?

and I have to use std::vectorstd:string. However, as the vector is not a simple type, I am not sure how to add it to a branch along with other leaves

The usual way:

std::vector<std::string> values;  // and/or a pointer
...
TBranch *newbranch = tree->Branch("names", &values);

To add a Branch to existing TTree, after opening the file in update mode and retrieving the TTree, you can create the branch as usual but instead of using tree->Fill() you would use:

newbranch->BackFill(); // or newbranch->Fill() with ROOT version older than v6.14/00

in you event/entry loop.

Cheers,
Philippe.

LeWhoo · January 7, 2021, 11:30pm

I’ve read in some older posts that array of strings made as chars (which is array of array of chars with variable length, I understand) does not work properly with TTrees. I think I also tried it and indeed something like mychar[50]/C (which would be 50 strings of variable size) didn’t work.

TBranch *newbranch = tree->Branch("names", &values);

I think this creates a branch with a single leaf. I would like to add a vector of strings to a multi-leaf branch. Can I do that?

pcanal · January 8, 2021, 2:22am

Right. I misunderstood where the emphasis (it was not TTree vs something else like TNtuple but rather with the “leaflist” syntax itself).

I think this creates a branch with a single leaf.

Yes.

I would like to add … to a multi-leaf branch.

You can not add to an existing multi-leaf branch (that is because the data is anyway stored as a single block so that would mean inserting some additional data in the already compressed data on disk).

You also can not mix leaflist mode of creation and objects.

But anyway, why would you want to do this? What is the advantage? A priori anything you can do if you had succeeded to create what you described you can do with the additional single-leaf branch.

Cheers,
Philippe.

LeWhoo · January 8, 2021, 8:48am

It is a matter of order. I am translating an HDF5 file structure to ROOT. In the HFD5 I have arrays of structures (in the C meaning), and inside those structures, I have strings of variable size. My idea is that each structure should be translated as a multi-leaf branch in a ROOT TTree. Destroying the original hierarchy will make everything much less readable, but I understand that it can’t be done hierarchically in ROOT.

pcanal · January 8, 2021, 5:29pm

Actually this can be done (for example we could generate a dictionary for the struct and create an unsplit object branch base on the array of struct ; this can even be done if you don’t have access to the structure’s definition at compile time).

A side, related, question is whether using the exact same structure for the ROOT file and the HDF5 is the best. Using the leaflist technique means that the data in the array of struct is stored in a single block (i.e, the buffer seen by the compression engine contains struct elem 0 var 0, then struct elem 0 var 1, etc … then struct elem 1 var 0, then struct elem 1 var 1. This has 2 major disavantages,

all the data needs to always be read even if you are interested in only one variable
the compression engine sees heterogeneous data.

If instead you create one branch per variable (this can be done while preserving the hierarchy by using the same technique I mentioned at the beginning of this post), the instead each (sub) branch will have their own buffer and the compression engine will see [struct elem 0 var 0, struct elem 1 var 0, etc.] and in another operation/separately it will see [struct elem 0 var 1, struct elem 1 var 1, etc.] and beside getting better compression ration you will then be able to read individual variable without have to load from the disk nor having to decompress any of the other data.

LeWhoo · January 8, 2021, 8:15pm

I see. However, dictionaries are somewhat tedious if one is using only basic type. And char, *char or **char is a basic type. I would bet it is not uncommon to have an array of variable size strings in some kind of an event. I understand that this is far from a priority, but adding handling of this kind of basic data would add a feature that is clearly missing in the basic use of TTree (and is implemented in “competing” format like HDF5).

That is actually a very valid point. I wanted to try to use ROOT TTree to compare the speed with HDF5 which is claimed to be slow. So for the sake of speed and compression I think I’ll sacrifice the readability and implement everything in separate branches, as you’ve suggested. Thanks!

Axel · January 11, 2021, 7:54am

Hi @LeWhoo ,

Would you be willing to share your comparison results once you have them, for instance in a new forum post? Or if it’s for publication, sharing a link to the paper would also be nice! We’d also be happy to review the code that you use for the comparison, to make sure there’s no obvious performance bottleneck.

Cheers, Axel.

system · January 25, 2021, 7:54am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.