Bool_t not supported in `AsMatrix`

simeloni · March 3, 2021, 2:50pm

Dear experts,
I am trying to read a TTree in PyROOT, and convert it into an array.
I am using the function AsMatrix() provided by the TTree class, but as soon as I try to include a branch which contains Bool_t values, I get the following error

*** Exception: Reading of branch ['truth_matched'] is not supported (branch has unsupported data-type ['Bool_t']).

Is there any way to circumvent this problem?
Thanks in advance
S.

_ROOT Version: 6.16
_Platform: SLC6
Compiler: Not Provided

jalopezg · March 3, 2021, 4:02pm

Hi @simeloni,

AFAIK, the pythonization for TTree does not support Bool_t. Other than that, @etejedor may know of a workaround, if any. Unfortunately, he is off until April.

Cheers,
J.

moneta · March 4, 2021, 5:37pm

@swunsch can also probably help here. I think you could create a snapshot of the tree where you convert the bool to a float value.

Cheers

Lorenzo

swunsch · March 5, 2021, 7:57am

Hi!

Best solution is AsNumpy with support for boolean values Indeed, booleans in this context are evil, the memory layout in numpy arrays (1 byte per bool) and in std::vector (1 bit per bool) makes quite some complications.

However, try this!

import ROOT
import numpy as np

# Create some data on the fly with RDataFrame, plug in your own dataset!
# - a float column
# - a boolean column
# - an integer column (created from the boolean column, potentially preferred and more efficient)
df = ROOT.RDataFrame(5) \
         .Define('some_float', 'float(rdfentry_)') \
         .Define('some_bool', 'rdfentry_ > 2') \
         .Define('some_int', 'int(some_bool)')

# Move the data to numpy arrays
data = df.AsNumpy(['some_float', 'some_bool', 'some_int'])
print(data)

# Optional: Make a matrix of floats out of it
matrix = np.vstack((data[col] for col in data)).astype(np.float)
print(matrix)

{
 'some_float': ndarray([0., 1., 2., 3., 4.], dtype=float32),
 'some_bool': ndarray([False, False, False, True, True], dtype=object),
 'some_int': ndarray([0, 0, 0, 1, 1], dtype=int32)
}
[[0. 1. 2. 3. 4.]
 [0. 0. 0. 1. 1.]
 [0. 0. 0. 1. 1.]]

Cheers,
Stefan

simeloni · March 5, 2021, 4:03pm

Dear all,
thanks for your help! Indeed as @swunsch said, the best idea would be to use AsNumpy. Problem is that we are bound (from analysis needs) to ROOT v6.16, which does not have support for this, afaik.

@moneta, yes I think that would work, but would be a bit slow for our needs, since we are dealing with huge datasets. The solution I have found is similar in fact: I create a new branch in the DataFrame, converting the bool into a int, and then I snapshot just this new additional branch. Later on, I define this additional tree as friend, and use the new branch, that now works in AsMatrix.

It is a bit cumbersome, but practical. If you have any other solution which is compatible with the above constraints, please share it .

S.

system · March 19, 2021, 4:03pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.