Hiii everyone!!
I am trying to shift to python3. I am reading the tree currently following manner.
Does anyone have any suggestions to improve this method to make it faster?
Also, how can it be used with pandas?
f = ROOT.TFile.Open(filename,“READ”)
myTree = f.Get(“T”)
for evnt in myTree:
Event = evnt.Event
Trig = evnt.Trigger
RTime = evnt.Rtime
PMult = evnt.ProtDetPMult
P1Mult = evnt.ProtDetP1Mult
P2Mult = evnt.ProtDetP2Mult
P3Mult = evnt.ProtDetP3Mult
ADC = [e for e in evnt.ProtDetADC]
TDC = [e for e in evnt.ProtDetTDC]
Hit = [e for e in evnt.ProtDetHit]
Chan = [e for e in evnt.ProtDetChan]
Det = [e for e in evnt.ProtDetDet]
Iterating over the events in your tree in Python can be slow if the dataset is big enough, as you might have experienced. My suggestion would be to use RDataFrame:
You can use RDataFrame from Python to read and process your dataset, but the event loop is executed in C++. You can also go to pandas once you have processed your dataset, as you can see in this tutorial:
I a bit confused with Rdataframe. I have data and I want to do event by event analysis. Using Rdataframe I can have filtering of data but how can I perform even by even analysis?
The way you program with RDataFrame is indeed different than writing an explicit event loop.
In RDataFrame, you express your analysis as a series of operations on your dataset. Those operations can be transformations of your dataset (e.g. filtering events, defining new columns) or actions, where you actually get back a result (e.g. a histogram). The code you provide for those operations works with a single event, but RDataFrame will apply it to all the events of your dataset. For example, when you call RDataFrame’s Filter, the filter expression you provide will be applied to every event of the dataset (RDataFrame will read the values of the columns used in the expression for every event, and evaluate the expression for every event with those values).
For example, in my case, I have a few branches as Array. If hypothetically, I have data as follows.
ADC = [15445,5454,3444,64424,8424,94244,1544,1214]
Chan = [1,2,3,4,5,8,10,15]
Det = [2,2,2,2,2,2,2,1]
So, now if my condition is “Det == 2 && Chan == 1 && 2 && 3” but now all the ADC corresponds different channel will go in same histogram. If I want to get a histogram of individual Chanel, how can I do that?
I am sorry for my naive understanding. It would be helpful if you guide me.
If you use Numba.Declare, you can define a Python callable that works with those array branches as if they were NumPy arrays. In the first snippet that you posted for your current version of the code, it seems you are storing those branches in Python lists and then working with them as such. With RDataFrame and Numba.Declare, your python callable would just receive the branches as numpy arrays. For example, if you want to use the following callable in a Filter, you would decorate it like so:
@ROOT.Numba.Declare(['RVecI', 'RVecI', 'RVecI'], 'bool')
def my_function(ADC, Chan, Det):
# ADC, Chan and Det are numpy arrays here, they correspond to the values of those branches for a given event
...
# return True or False