Reading data on ROOT and perform calculations on the data in python

Hello, i dont have any backgrounds on C++ but i understand the basics of ROOT. I need to read a given set of data, some columns, on a .root file to perform some calculations. The thing is my routine is written in python so i was wondering if there are some way to use PyROOT to read said data and then use my routine.

Your data are already in a .root file ? if yes, what are they ? a TTree ? a TNuple ? an Histogram ? …

The data is in a TTree because i constructed that. Im having trouble accessing and storing a specific branch from the TTree. For now im trying to do data just on ROOT and after the routine is done i hope i can adapt for pyROOT because i want to use the branch i saved to perform calculations. Maybe is usefull to see the code i developed:

void test()

{

TFile *input = new TFile("data_test.root","read");

TTree *tree = (TTree*)input -> Get("Data_R;17");

double store[] = {};

double Energy, Timestamp;

tree -> SetBranchAddress("Energy",&Energy);

int entries = tree -> GetEntries();


for(int i = 0; i < entries; i++)
{
    tree ->GetEntry(Energy);
    
    store = Energy;

}

//input -> Close();

}

Hi @AndreM ,
Thanks for reaching out! You can use RDataFrame to process your TTree data:

import ROOT

df = ROOT.RDataFrame("mytreename","myfile.root")

# Perform calculations on your pre-existing branch
# creating a new column in the dataframe
df_withcol = df.Define("newcolumn", "calculation_on(mybranch)")

# Get some result, maybe an histogram?
h = df_withcol.Histo1D("newcolumn")

# This starts the computations and returns a TH1D object
h_value = h.GetValue()

From the example you provide above it seems to me you’re just trying to bring the values of an already existing branch into some vector? Maybe in Python you would like to do so by retrieving a numpy array?

import ROOT

df = ROOT.RDataFrame("mytreename","myfile.root")

# Retrieves a dictionary of numpy arrays from the column names you provide
npy_dict = df.AsNumpy(["mybranch", ])

# This is a numpy array with the contents of your column
values = npy_dict["mybranch"]

You can find more information about RDataFrame in the docs

2 Likes

Hello @vpadulan your examples really helped me! I was not aware of this RDataFrame class and its very usefull for my problem becaus right now im just interested in retrieving the data from the .root file which i put on a TTree.

1 Like

Im trying to use your sugestions to solve my problem but now i have new questions. I was able to use c++ to make a TTree in ROOT but im not sure if TTree is acessible just inputting this code line :

In addition when i try to run this routine:

import pyroot as ROOT

df = ROOT.RDataFrame("tree","data_test.root")

npy_dict = df.AsNumpy(["Energy"])

values = npy_dict["Energy"]

I get this error:

AttributeError: module ‘pyroot’ has no attribute ‘RDataFrame’

Any ideas? :slight_smile:

I’m not sure about import pyroot, the official ROOT python package is imported with import ROOT. Did you install ROOT through one of the official installation options?

So i made a mistake but the problem remains. The pyroot i downloaded from github is a package for root finding methods, sadly i assume this was related to ROOT itself. So even though i have ROOT properly installed, because i followed the instructions of the official page and i already saw some tutorials to create simple routines and explore the program capabilities i cant import ROOT on python3 so i ask help on solving this problem first if someone could be of assistance.

Dear @AndreM,

So I will suppose that you have already uninstalled the unrelated pyroot package.

Once that is done, what does python -c "import ROOT" output ?

Dear @vpadulan ,

That returns me:

ModuleNotFoundError: No module named 'ROOT'

Even though i can open ROOT with no problems and run every routine i wrote. Maybe is usefull to say that i used snap to install ROOT and the OS im working with is linux mint.

Im reading some material online and i think the problem lives on the fact that i need to load c++ libraries into python so i can use ROOT, but this is just a guess i have.

Dear @AndreM,
The Snap package for ROOT has some differences w.r.t. other types of installation, to make sure the snap package doesn’t interfere with your system packages.

Namely, to use ROOT within python, the Snap package provides you with a command pyroot, that will start a bundled Python interpreter that comes with the Snap package itself and is able to import ROOT:

$ sudo snap install root-framework
$ root # Starts the ROOT C++ interpreter
$ pyroot # Starts a Python 3.8 session
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> 

This is described in the blog post of the Snap package.

2 Likes

I noticed this is not explained also in the installation instructions, so I created a PR for that

1 Like

Hello @vpadulan!
Thanks for the assistance, in fact it works now so ill try to work on the previous routine using the RDataFrame and ill get back to you when i achieve my goal :slight_smile:

Thanks to your help i was able to complete the routine i wanted! Thanks for all the help! :slight_smile:

1 Like