Draw short integers as numbers, not characters

I’ve tried a few things in 6.26/00…:

This works fine with both int8_t and uint8_t:

void test1() {
    auto f = TFile::Open("test1.root", "recreate");
    TTree t("Events", "");
    int8_t var;
    t.Branch("var", &var, "var/B");
    for (size_t i=0; i<10; i++) {
        var = i;
        t.Fill();
    }
    f->Write();
    f->Close();
}

This however, which is how I encountered the issue orginally, only works with uint8_t. Using int8_t results in the weird formatting:

void test() {
    auto f = TFile::Open("test2.root", "recreate");
    TTree t("Events", "");

    //typedef uint8_t T;  // this works fine
    typedef int8_t T;  // this doesn't

    std::string typeQual;
    if constexpr (std::is_same<T, uint8_t>())
        typeQual = "b";
    else if constexpr (std::is_same<T, int8_t>())
        typeQual = "B";
    else if constexpr (std::is_same<T, int>())
        typeQual = "I";

    unsigned int counter;
    std::vector<T> Jet_idx;

    t.Branch("nJet", &counter, "nJet/i");
    auto br = t.Branch("Jet_idx", (void*)nullptr, ("Jet_idx[nJet]/" + typeQual).c_str());

    for (size_t i=0; i<10; i++) {
        Jet_idx = std::vector<T>(i, i);
        br->SetAddress(const_cast<T*>(&Jet_idx.front()));
        counter = Jet_idx.size();
        t.Fill();
    }

    f->Write();
    f->Close();
}

I’m attaching the resulting file: test2.root (5.6 KB)

And to read the second tree:

void read() {
    auto f = TFile::Open("test2.root");
    TTreeReader reader("Events", f);
    TTreeReaderArray<int8_t> array(reader, "Jet_idx");
    while (reader.Next()) {
        cout << "Event " << reader.GetCurrentEntry() <<  endl;
        auto size = array.GetSize();
        cout << "Size: " << size << " - data: ";
        for (size_t i=0 ; i < size; i++) {
            auto val = array[i];
            cout << "raw: " << val << ", cast: " << static_cast<int>(val) << " -- ";
        }
        cout << endl;
    }
}

Which gives:

Error in <TTreeReaderArrayBase::CreateContentProxy()>: The branch Jet_idx contains data of type char. It cannot be accessed by a TTreeReaderArray<signed char>
Event 0
Size: 0 - data: 
Event 1
Size: 1 - data: raw: , cast: 1 -- 
Event 2
Size: 2 - data: raw: , cast: 2 -- raw: , cast: 2 --
...

The error is strange! This happens when I create the TTree using int8_t and /B!

When both writing and reading with uint8_t and /b I get no error:

Event 0
Size: 0 - data: 
Event 1
Size: 1 - data: raw: , cast: 1 -- 
Event 2
Size: 2 - data: raw: , cast: 2 -- raw: , cast: 2 --
...

Hello, any update on this?

I guess @pcanal may help.

The /B for historical reason is handled as a C-style string (eg const char *). For example tweaking the code above with:

        Jet_idx = std::vector<T>(i, 96+i);

you get the output:

root [1] Events->Scan("","","")
************************************
*    Row   * nJet.nJet * Jet_idx.J *
************************************
*        0 *         0 *           *
*        1 *         1 *         a *
*        2 *         2 *        bb *
*        3 *         3 *       ccc *
*        4 *         4 *      dddd *
*        5 *         5 *     eeeee *
*        6 *         6 *    ffffff *
*        7 *         7 *   ggggggg *
*        8 *         8 *  hhhhhhhh *
*        9 *         9 * iiiiiiiii *
************************************
(long long) 10

(This explains the histogram in the first post)

On the other hand, I can reproduce the problem with TTreeReaderArray which should have worked.

@pcanal Historical description → TTree

            - C : a character string terminated by the 0 character
            - B : an 8 bit signed integer (Char_t)
            - b : an 8 bit unsigned integer (UChar_t)

Yes, this is accurate. In addition, TTree::Draw and TTree::Scan treats an array of /B as a string.

Thanks @pcanal . Is there any way to get Draw or Scan to treat them as integers instead? It seems to me that that use case would be far more common than using it to store strings…

Yes. You can involve the value in a spurious arithmetic operation (i.e. +0):

root [1] Events->Scan("Jet_idx+0","","")
***********************************
*    Row   * Instance * Jet_idx+0 *
***********************************
*        0 *        0 *           *
*        1 *        0 *        97 *
*        2 *        0 *        98 *
*        2 *        1 *       110 *
*        3 *        0 *        99 *
*        3 *        1 *       110 *
*        3 *        2 *        99 *

The TTreeReader error message is spurrious and seems to still lead to correct reading, isn’t it (the spurrious error will be removed shortly, see Spurrious error message when reading a `char` from a `TTreeReader<signed char>` · Issue #11837 · root-project/root · GitHub to follow the resolution).

Thanks, that’s a useful trick.

However this will not work when clicking on a branch when inspecting a file in a TBrowser… Why can’t int8_t be interpreted as a number by default? I don’t see why anyone in HEP would want to make a histogram of characters, whereas storing small integers as int8_t seems like a common use case.

TTree does NOT know “int8_t” or “uint8_t” as variables’ “fundamental types”.
It only knows “Char_t” (“signed char”) and “UChar_t” (unsigned char), where “char” is assumed to be 8-bits wide.

BTW. I think, ROOT unconditionally expects that a “char” is a “signed char”.

Yes, I understand that. It does not mean TTree::Draw is obligated to treat Char_t as a character, does it?

I assume you could create your own branch with your own data type.
Something like:

int8_t sb;
uint8_t ub;
tree->Branch("My_signed_byte", &sb);
tree->Branch("My_unsigned_byte", &ub);

If it doesn’t work, immediately report to @pcanal :wink:

Thanks for the suggestion. I’ve tried to modify my example above to avoid specifying the /B and using a temporary :

unsigned int counter;
std::vector<int8_t> Jet_idx;

t.Branch("nJet", &counter, "nJet/i");
std::vector<int8_t> temp(1, 0);
auto br = t.Branch("Jet_idx", &(temp.front()), "Jet_idx[nJet]");

for (size_t i=0; i<10; i++) {
    Jet_idx = std::vector<int8_t>(i, i);
    br->SetAddress(const_cast<int8_t*>(&Jet_idx.front()));
    counter = Jet_idx.size();
    t.Fill();
}

But when reading I’m now getting:

Error in <TTreeReaderArrayBase::CreateContentProxy()>: The branch Jet_idx contains data of type float. It cannot be accessed by a TTreeReaderArray<signed char>

Though the casts to int do work:

Event 2
Size: 2 - data: raw: , cast: 2 -- raw: , cast: 2 --
...

TTree::Draw does not like it, I suppose it interprets the branch as float:

image

This is how branches are filled in the CMS NanoAODs, and I don’t think we can easily change that… I was simply looking into replacing the 32-bit integers currently used to store collection sizes, indices, charges, discrete IDs, …, by 8-bit integers, as it seemed like we could gain a few % on disk by doing that (despite compression). However I don’t want people using those files to have to do tricks like Jet_idx+0 when drawing branches… If it’s not possible, so be it, but it’s really unfortunate.

Always make a “counter” a “signed integer”, e.g.:

Int_t counter;
t.Branch("nJet", &counter, "nJet/I");

See the “var[nelem]” description in: TBranch::TBranch

If you define the branch using the “t.Branch("Jet_idx", &(temp.front()), "Jet_idx[nJet]")” syntax then you are again limited to the predefined variables’ “fundamental types”. The default is “Float_t” (i.e., ROOT thinks it’s a "Jet_idx[nJet]/F", so you get the error).

BTW. Why don’t you simply try (no need for the “nJet” at all):
auto br = t.Branch("Jet_idx", &temp);

Strange, we have had it as unsigned (with /i) in NanoAOD for years and it’s never caused any issue… In any case, that doesn’t solve the above problem.

Jet_idx is a variable-size array, how would that work?

Yes, I know this problem since years ago here.
It’s time you fix it.
The current statement is clear: The leaf referred to by nelem **MUST** be an int (/I)
Or you ask @pcanal and/or @Axel to finally support “unsigned integers” as [nelem] variable-size lengths.

A “std::vector” is by definition “variable-size”, too.

I’m only now learning about this… I’ll put it on the my list. Why is it such a problem if it’s been working fine for years?

Ah yes, OK. That means changing from arrays to STL vectors, this is really a change in the NanoAOD format which would potentially break many things downstream, so it’s not something we would do lightly.

Sometimes some things worked, and some did not.
You’d have to go through various posts here.

If you want to stay with the “arrays”, as provided by ROOT, you are limited to its predefined variables’ “fundamental types”.