Draw short integers as numbers, not characters

TTree does NOT know “int8_t” or “uint8_t” as variables’ “fundamental types”.
It only knows “Char_t” (“signed char”) and “UChar_t” (unsigned char), where “char” is assumed to be 8-bits wide.

BTW. I think, ROOT unconditionally expects that a “char” is a “signed char”.

Yes, I understand that. It does not mean TTree::Draw is obligated to treat Char_t as a character, does it?

I assume you could create your own branch with your own data type.
Something like:

int8_t sb;
uint8_t ub;
tree->Branch("My_signed_byte", &sb);
tree->Branch("My_unsigned_byte", &ub);

If it doesn’t work, immediately report to @pcanal :wink:

Thanks for the suggestion. I’ve tried to modify my example above to avoid specifying the /B and using a temporary :

unsigned int counter;
std::vector<int8_t> Jet_idx;

t.Branch("nJet", &counter, "nJet/i");
std::vector<int8_t> temp(1, 0);
auto br = t.Branch("Jet_idx", &(temp.front()), "Jet_idx[nJet]");

for (size_t i=0; i<10; i++) {
    Jet_idx = std::vector<int8_t>(i, i);
    counter = Jet_idx.size();

But when reading I’m now getting:

Error in <TTreeReaderArrayBase::CreateContentProxy()>: The branch Jet_idx contains data of type float. It cannot be accessed by a TTreeReaderArray<signed char>

Though the casts to int do work:

Event 2
Size: 2 - data: raw: , cast: 2 -- raw: , cast: 2 --

TTree::Draw does not like it, I suppose it interprets the branch as float:


This is how branches are filled in the CMS NanoAODs, and I don’t think we can easily change that… I was simply looking into replacing the 32-bit integers currently used to store collection sizes, indices, charges, discrete IDs, …, by 8-bit integers, as it seemed like we could gain a few % on disk by doing that (despite compression). However I don’t want people using those files to have to do tricks like Jet_idx+0 when drawing branches… If it’s not possible, so be it, but it’s really unfortunate.

Always make a “counter” a “signed integer”, e.g.:

Int_t counter;
t.Branch("nJet", &counter, "nJet/I");

See the “var[nelem]” description in: TBranch::TBranch

If you define the branch using the “t.Branch("Jet_idx", &(temp.front()), "Jet_idx[nJet]")” syntax then you are again limited to the predefined variables’ “fundamental types”. The default is “Float_t” (i.e., ROOT thinks it’s a "Jet_idx[nJet]/F", so you get the error).

BTW. Why don’t you simply try (no need for the “nJet” at all):
auto br = t.Branch("Jet_idx", &temp);

Strange, we have had it as unsigned (with /i) in NanoAOD for years and it’s never caused any issue… In any case, that doesn’t solve the above problem.

Jet_idx is a variable-size array, how would that work?

Yes, I know this problem since years ago here.
It’s time you fix it.
The current statement is clear: The leaf referred to by nelem **MUST** be an int (/I)
Or you ask @pcanal and/or @Axel to finally support “unsigned integers” as [nelem] variable-size lengths.

A “std::vector” is by definition “variable-size”, too.

I’m only now learning about this… I’ll put it on the my list. Why is it such a problem if it’s been working fine for years?

Ah yes, OK. That means changing from arrays to STL vectors, this is really a change in the NanoAOD format which would potentially break many things downstream, so it’s not something we would do lightly.

Sometimes some things worked, and some did not.
You’d have to go through various posts here.

If you want to stay with the “arrays”, as provided by ROOT, you are limited to its predefined variables’ “fundamental types”.

If these problems have never been reported to the NanoAOD developers, we cannot be expected to know about them.

It’s what I’m doing. I want to store 8 bit signed integer, and have them treated like numbers, not characters.


Thanks for your useful help and the cordial discussion.

@couet Maybe one should fix this description? Instead of the word “integer”, there should be the word “character”.
Also, maybe the explicit statement “The leaf referred to by nelem **MUST** be an int (/I)” should be copied from the hidden TBranch::TBranch description to the TTree → Add a column (“branch”) holding fundamental types and arrays thereof

@pcanal and/or @Axel … is there any chance that other types of variables will be supported as [nelem] variable-size lengths (at least “unsigned int”, but maybe also other integer types would be useful in a long run).
Same for the TTreeReader, of course:

1 Like

As it is in the TTree documentation I let @pcanal comment.

Internally, TTree will use the counter as a signed integer independently of what you specified. This could be a problem for very large value of the counter. In particular terms as mentioned by Wile, there are some code path and code pattern that reject or misuse the unsigned integer and some that don’t. So even-though it seems to work in most case, it is better to avoid the mismatch (and unfortunately updating to properly support other type is non-trivial).

Why can’t int8_t be interpreted as a number by default?

Because for a long time, this was the only way to store textual labels (and the TBrowser can not tell what is the semantic of the fields, i.e. label or value :frowning: ).

Wile’s proposals sounds good. @couet Could you implement them?

This, implicitly, request the branch to be create to hold floats. Since the 2 have the same size, it might still appear to work but the automatic tool will definitively get it wrong (Draw, Browser, etc.)

However I don’t want people using those files to have to do tricks like Jet_idx+0 when drawing branches…

You have 2 potential options. Either use an unsigned char or a signed short (yes, at double the storage cost but automatic tools will work properly and some of the extra storage space will be recovered by compression).

Thanks for the clarifications @pcanal ! We will switch to signed integers for the counters in nanoAOD. Using 16 bit integers instead of 8 bit could be an intereting middle ground indeed, I’ll give it a try.

@pcanal: The PR is here. Let me know if it’s fine for you.