Wrong branch type detection

This might be a misunderstanding, rather than a bug.

I examined the file from CERNBox (https://cernbox.cern.ch/s/4mOHVGU509efdAS) and created a new one using @umute97’s reproducer (https://root-forum.cern.ch/t/wrong-branch-type-detection/59236/19). In both cases, I see a TTree with TBranches whose type is double[] (variable length arrays), and they have a common counter TLeaf in a TBranch named n. They are not std::vector<double> because Uproot does not write this data type. (https://github.com/scikit-hep/uproot5/issues/257 has been open for a long time, but it would be a big project.)

The variable length arrays happen to all have the same length, 1024, but that’s because the Awkward Arrays were constructed this way, with a for loop over chunk_size filling a Pandas DataFrame with dtype=object (Python lists, not arrays), and then that was converted into an Awkward Array (by iteration over the Python lists in the DataFrame). That is, the array’s type is

10 * var * float64

instead of

10 * 1024 * float64

If you wanted arrays of fixed-size data, in which 1024 is part of the data type, you could construct it with NumPy and pass that to Awkward, or just convert the Awkward data ak.to_regular (https://awkward-array.org/doc/main/reference/generated/ak.to_regular.html) after the fact, replacing

    output_file[tree] = {"": ak.zip(data)}

with

    output_file[tree] = {k: ak.to_regular(v) for k, v in data.items()}

(I also made the dict explicit, instead of concatenating "" to record field names.) With a construction like that, the ROOT file would be filled with

******************************************************************************
*Tree    :tree      :                                                        *
*Entries :       10 : Total =          165306 bytes  File  Size =     156980 *
*        :          : Tree compression factor =   1.05                       *
******************************************************************************
*Br    0 :w0        : w0[1024]/D                                             *
*Entries :       10 : Total  Size=      82482 bytes  File Size  =      77811 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.05     *
*............................................................................*
*Br    1 :w1        : w1[1024]/D                                             *
*Entries :       10 : Total  Size=      82482 bytes  File Size  =      77826 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.05     *
*............................................................................*

instead of

******************************************************************************
*Tree    :tree      :                                                        *
*Entries :       10 : Total =          165973 bytes  File  Size =     157659 *
*        :          : Tree compression factor =   1.05                       *
******************************************************************************
*Br    0 :n         : n/I                                                    *
*Entries :       10 : Total  Size=        577 bytes  File Size  =         92 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.17     *
*............................................................................*
*Br    1 :w0        : w0/D                                                   *
*Entries :       10 : Total  Size=      82590 bytes  File Size  =      77881 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.05     *
*............................................................................*
*Br    2 :w1        : w1/D                                                   *
*Entries :       10 : Total  Size=      82590 bytes  File Size  =      77872 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.05     *
*............................................................................*

if that’s what you’re trying to do.

But re-reading the whole thread again, it doesn’t seem to be about fixed-length versus variable-length types; it seems to be about double[] arrays versus std::vector<double> (both of which are variable-length types). Uproot doesn’t write std::vector, and the tree.show() table shows arrays, not vectors, as the C++ type (middle column):

>>> tree.show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
n                    | int32_t                  | AsDtype('>i4')
w0                   | double[]                 | AsJagged(AsDtype('>f8'))
w1                   | double[]                 | AsJagged(AsDtype('>f8'))
w10                  | double[]                 | AsJagged(AsDtype('>f8'))
w11                  | double[]                 | AsJagged(AsDtype('>f8'))
w12                  | double[]                 | AsJagged(AsDtype('>f8'))
w13                  | double[]                 | AsJagged(AsDtype('>f8'))
w14                  | double[]                 | AsJagged(AsDtype('>f8'))
w15                  | double[]                 | AsJagged(AsDtype('>f8'))
w2                   | double[]                 | AsJagged(AsDtype('>f8'))
w3                   | double[]                 | AsJagged(AsDtype('>f8'))
w4                   | double[]                 | AsJagged(AsDtype('>f8'))
w5                   | double[]                 | AsJagged(AsDtype('>f8'))
w6                   | double[]                 | AsJagged(AsDtype('>f8'))
w7                   | double[]                 | AsJagged(AsDtype('>f8'))
w8                   | double[]                 | AsJagged(AsDtype('>f8'))
w9                   | double[]                 | AsJagged(AsDtype('>f8'))
trg0                 | double[]                 | AsJagged(AsDtype('>f8'))
trg1                 | double[]                 | AsJagged(AsDtype('>f8'))

(Dang, I had to go back and remove a bunch of links, too!)

1 Like

Dear @jpivarski ,
thank you very much for your very legible clarifications. I will try to implement fixing the array length the way you suggested and see if the error still persists.

Cheers,
Umut

What error are you referring to, though?

This one,

will persist because neither double[] nor double[1024] are vector<double> and can’t be cast as such in C++. Uproot doesn’t write vector<double>, so any files that you make with Uproot would have to be read in ROOT as arrays, not vectors.

1 Like

@jpivarski Independently of the ‘real’ issue that you point at (array vs vector), there is still an improvement needed in uproot. On this file, I see:

root [11] wfm->GetBranch("w0")->GetTitle()
(const char *) "w0/D"
root [12] ((TLeaf*)wfm->GetBranch("w0")->GetListOfLeaves()->At(0))->GetTitle()
(const char *) "w0[n]"

where there correct values are:

root [11] wfm->GetBranch("w0")->GetTitle()
(const char *) "w0[n]/D"
root [12] ((TLeaf*)wfm->GetBranch("w0")->GetListOfLeaves()->At(0))->GetTitle()
(const char *) "w0[n]"

and similarly for fixed array:

root [11] wfm->GetBranch("w0")->GetTitle()
(const char *) "w0[1024]/D"
root [12] ((TLeaf*)wfm->GetBranch("w0")->GetListOfLeaves()->At(0))->GetTitle()
(const char *) "w0[1024]"

Thanks,
Philippe.

1 Like

To be sure that I understand this, the title (fTitle) of a TBranch of array type needs to be formatted as:

NAME_OF_BRANCH[NAME_OF_COUNTER]/[SINGLE_LETTER_TYPE]

and the title of its TLeaf must be

NAME_OF_BRANCH[NAME_OF_COUNTER]

(where NAME_OF_BRANCH is the name (fName) of the TBranch, NAME_OF_COUNTER is either the name of the counter or an integer for fixed-size arrays, and SINGLE_LETTER_TYPE is D for double, etc.).

If that’s right, Uproot is currently producing the wrong TBranch titles, but the correct TLeaf titles. I would need to add the [NAME_OF_COUNTER] part to the TBranch titles only. I just confirmed this pattern by looking at some files that ROOT created, but if I’m missing something, let me know!

Yes, almost.

The syntax is actually, for branches:

name_of_leaf[name_of_counter | fixed_size_number]/single_letter_type

and this is repeated for how many leaf there is in the branch, which a colon separator.
(multi leaf branch can be created with: tree->Branch("top", &struct_memory, "leaf1/D:leaf2/F:leaf3/I");)

For leaf, it s

name_of_leaf[name_of_leaf_counter]

It is typical to have the name of the branch being the same as the name of the leaf where there is a single leaf.

1 Like

Okay, thanks! (Uproot does not write TBranches with multiple leaves, either, so this is sufficient.)

I made a fix in PR https://github.com/scikit-hep/uproot5/pull/1207 and released it in Uproot 5.3.5 (https://pypi.org/project/uproot/5.3.5/).

2 Likes

Yes, you are right, of course. I have tried reading them in as arrays, which worked. Have another problem with the analysis, though. However, this is none of your concern and thus, I’ll mark this issue as resolved for now. Thank you all for your help!