Saving pandas dataframe as TTree with RDataframe

Hi there!

The issue here is manifold. First, how is a bool represented in your CSV? I try to understand whether the CSV datasource of RDataFrame just makes an implicit conversion to int or whether it’s the CSV format, which has no definite layout to store boolean values (it’s just a text file!).

However, you can find below a solution to convert a pandas dataframe to a ROOT TTree. Also here, you face the issue that no direct conversion of boolean numpy arrays to booleans in a TTree is possible. This time, the issue is that the memory layout of boolean numpy array (one bool per byte) is different from the C++ layout (one bool per bit), which doesn’t allow us to read the numpy array directly.

Does this fit your needs?

# Create a pandas dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['x'] = np.array([1.0, 2.0, 3.0]) # double
df['y'] = np.array([4, 5, 6]) # long
df['z'] = np.array([True, False, True]) # boolean

# Have a look!
print(df)

# Convert data to a dictionary with numpy arrays
data = {key: df[key].values for key in ['x', 'y', 'z']}

# Unfortunately booleans in a numpy array don't have the same
# memory layout as in C++, and therefore we cannot adopt boolean
# columns on the C++ side of RDataFrame.
# The workaround is reading them out as integers.
data['z'] = data['z'].astype(np.int)

# Write the dictionary with numpy arrays to a ROOT file
import ROOT
rdf = ROOT.RDF.MakeNumpyDataFrame(data)
rdf = rdf.Define('z_bool', '(bool)z') # Let's rewrite z as bool
rdf.Snapshot('tree', 'file.root')

# Again, have a look!
rdf.Display().Print()
# print(pandas.DataFrame)
     x  y      z
0  1.0  4   True
1  2.0  5  False
2  3.0  6   True

# RDataFrame.Display().Print()
z_bool | x         | y | z | 
true   | 1.0000000 | 4 | 1 | 
false  | 2.0000000 | 5 | 0 | 
true   | 3.0000000 | 6 | 1 |

Best
Stefan

1 Like