Moving Branch entries to Numpy Array

Hi I am starting to use PyROOT in ROOT v6, and of course I am a total beginner, I am desperately need help.

I am wondering if we can move a branch entries to a numpy array.

For Example:

import ROOT
import numpy as np

f = ROOT.TFile("dataInput.root")
tree1 = ROOT.gROOT.FindObject("clDump")
tree1.Print()

******************************************************************************
*Tree    :clDump    : clDump                                                 *
*Entries :   234168 : Total =        23282717 bytes  File  Size =   10330410 *
*        :          : Tree compression factor =   2.25                       *
******************************************************************************
*Br    0 :eventNr   : B0/I                                                   *
*Entries :   234168 : Total  Size=     939839 bytes  File Size  =       7348 *
*Baskets :       30 : Basket Size=      32000 bytes  Compression= 127.78     *
*............................................................................*
*Br   1 :gx1       : B4/F                                                   *
*Entries :   234168 : Total  Size=     939715 bytes  File Size  =     812434 *
*Baskets :       30 : Basket Size=      32000 bytes  Compression=   1.16     *
*............................................................................*
*Br   2 :gx2       : B5/F                                                   *
*Entries :   234168 : Total  Size=     939715 bytes  File Size  =     871216 *
*Baskets :       30 : Basket Size=      32000 bytes  Compression=   1.08     *
*............................................................................*

momentum = np.zeros(1,dtype=float)

If Possible how can we get the gx0 branch into a numpy array called momentum

I’ve read the documentation from a post, but somehow I still don’t understand how to put the value in a branch to a numpy array

Thanks in advance :slight_smile:

__
ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


1 Like

Hi,

this tutorial explains how to achieve what you are after: https://root.cern/doc/master/pyroot002__TTreeAsMatrix_8py.html

Just make sure you are running ROOT v614 at least.

Cheers,
D

The tutorial Danilo linked gives all information you need. But here a short snipplet for you:

import ROOT

# Open file remotely via http (TMVA classification example)
f = ROOT.TFile.Open("http://root.cern.ch/files/tmva_class_example.root")

# The file has a tree called "TreeS"
t = f.Get("TreeS")

# Get branch "var1" as numpy array
data = t.AsMatrix(["var1"])

# Plot it!
import matplotlib.pyplot as plt
plt.hist(data)
plt.savefig("var1.png")

You’ll get the following plot:

x

Whoaw I missed that, thank you so much. I have to upgrade my Root first, because I’m running on version 6.10

Thanks a lot :+1:

You are so kind, I’ve tried the snippet but the AsMatrix doesn’t exist, so I will have to upgrade my Root first. Thank you very much swunch :grin:

@swunsch @Danilo Hi guys, I encountered another problem. I’ve installed Root 6.14 and try the as matrix, and this error message appeared

data = t.AsMatrix(['gx0'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/<root-directory>/root-6.14.04/obj/lib/ROOT.py", line 339, in _TTreeAsMatrix
    [invalid_cols_leafname[k] for k in invalid_cols_leafname])))
Exception: Reading of branch ['gx0'] is not supported (name of leaf is different from name of branch [2]).

Thank you again in advance

Hmm… The TTree.AsMatrix feature does allow only to convert flat trees to numpy. It assumes that the leaf of the branch that you pass has the same name than the branch (because it cannot know the name). How is gx0 structured? I cannot see it in the tree.Print() output above.

So a tree like the one below:

>>> f = ROOT.TFile.Open("http://root.cern.ch/files/tmva_class_example.root")
>>> t = f.Get("TreeS")
>>> t.Print()
******************************************************************************
*Tree    :TreeS     : TreeS                                                  *
*Entries :     6000 : Total =           98896 bytes  File  Size =      89768 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Br    0 :var1      : var1/F                                                 *
*Entries :     6000 : Total  Size=      24641 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :var2      : var2/F                                                 *
*Entries :     6000 : Total  Size=      24641 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    2 :var3      : var3/F                                                 *
*Entries :     6000 : Total  Size=      24641 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    3 :var4      : var4/F                                                 *
*Entries :     6000 : Total  Size=      24641 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

I see, so if the leaf has a different name it will not be possible to load it. Here is the print() output.

*............................................................................*
*Br   22 :gx0       : B3/F                                                   *
*Entries :   234168 : Total  Size=     939730 bytes  File Size  =     819909 *
*Baskets :       30 : Basket Size=      32000 bytes  Compression=   1.15     *
*............................................................................*
*Br   23 :gx1       : B4/F                                                   *
*Entries :   234168 : Total  Size=     939730 bytes  File Size  =     812434 *
*Baskets :       30 : Basket Size=      32000 bytes  Compression=   1.16     *
*............................................................................*
*Br   24 :gx2       : B5/F                                                   *
*Entries :   234168 : Total  Size=     939730 bytes  File Size  =     871216 *
*Baskets :       30 : Basket Size=      32000 bytes  Compression=   1.08     *
*............................................................................*

As seen from the TreeS.print() that you showed, the branch has a name var4 :var4/F but in my print() it is gx0:B3/F

Is it possible to change the branch name?

Oh, indeed, that could solve your problem! Does something like this work for you?

import ROOT

# Open file remotely (TMVA classification example)
f = ROOT.TFile.Open("http://root.cern.ch/files/tmva_class_example.root")

# The file has a tree called "TreeS"
t = f.Get("TreeS")

# Rename branch and leaf
t.GetBranch("var1").SetName("var1_renamed")
t.GetLeaf("var1").SetName("var1_renamed")

# Get branch "var1_renamed" as numpy array
data = t.AsMatrix(["var1_renamed"])

# Plot it!
import matplotlib.pyplot as plt
plt.hist(data)
plt.savefig("x.png")
1 Like

So in your case:

tree.GetLeaf("B3").SetName("gx0")
1 Like

@swunsch I am really grateful, thank you so so much. Finally with this program I can move to the next step, again thank you so much. I hope somehow in the future I can help people, just like you :grin:

By the way this is the result, only possible thanks to you :+1::+1:

Cheers,

1 Like

Thanks, that looks very nice :slight_smile:

To get the best performance out of TTree.AsMatrix, you can write the following:

data = t.AsMatrix(["gx0", "gx1", "gx2"])
plt.scatter(data[:,0], data[:,1], s=0.0005)

Using only one call to TTree.AsMatrix does only one loop over the tree and writes out all data at once. One more tip, with adding ROOT.ROOT.EnableImplicitMT() before, you can even run the data-loading on multiple threads (in case it takes significant time to load the data).

Cheers!

2 Likes

Hi,

I support the nice solution @swunsch proposed and you adapted to your problem.
If you can use the ROOT head, the feature will be available in the ROOT version 6.16 foreseen for november, you can produce your scatter plot directly with RDataFrame:

import ROOT
// open file, get tree, setting the leaf names (if you want)
rdf = ROOT.ROOT.RDataFrame (t)
g = rdf.Graph("gx0", "gx1")

# the explicit handling of the canvas are there to inline the plot in the notebook
c = ROOT.TCanvas()
g.Draw()
c.Draw()

Cheers,
Danilo

1 Like

Thanks but I think my computer doesn’t like the ROOT.ROOT.EnableImplicitMT() because I tried running it on Jupyter and the terminal, both got the same error

Great to know you @swunsch :grin:

Hello @Danilo thank you for your help and suggestion, this will really cut the process of plotting the points, because sometimes changing the name of the leaf produce an error. I wonder if its my laptop or the jupyter notebook.

You guys are awesome

Cheers,

Hi Arifin,

this is odd. Are you running Jupyter on your conputer or at CERN on SWAN?
Can you share the file so that we can try to reproduce your problem?

Cheers,
D

Hi Danilo,

I am running it on my computer, but because you said about swan, now I am running on SWAN. How to share a project with you?

But in SWAN I can’t run t.AsMatrix() maybe because the root version is not up to date?

No way this Presentation about Python and ROOT: Effective and Interactive Analysis of Big Data is really great, thank you soo much @Danilo

Also a Keras Workshop by @swunsch is really interesting, I will slowly master pyROOT in SWAN and hopefully can integrate machine learning.

I will never be bored to say thank you soo much to you guys! Amazing :+1:

Cheers!

Hi,

I think the issue is that presently the bleeding edge stack is associated to the last 6.14 release and not master. I need to check with our librarians.
Did you check the online SWAN help for the sharing instructions? It’s accessible from the very web interface.

Cheers,
D

note that on SWAN, it might be easier to install uproot (installable via pip install) to try out these numpy-eries. (uproot is pure-Python and was built with interop with the numpy/scipy stack to start with.)