for filename in sorted(os.listdir(path)):
if filename.endswith('.root'):
#chain.Add(filename)
f = ROOT.TFile(filename,"read")
counts = np.zeros((1,1,48,48), dtype=np.float32)
t = f.Get("tree")
t.SetBranchAddress("branch", counts)
box.append(counts)
n += t.GetEntries()
As you could see branch is a [1][1][48][48] array and I would like to store its content in a
list per each entries. The problem is that both var counts that box are filled with all zeros.
I really cannot understand why. Some help?
Actually I’m able to read them making a chain. Then I process data with a
pandas dataframe but it is really really low and it requires a lot of memory resources,
so much so that the process is killed sometimes. (Consider that it takes one minute
and half just to read 233088 entries). Anyway this is the working code:
for filename in sorted(os.listdir(path)):
if filename.endswith('.root'):
chain.Add(filename)
for event in chain:
counts = event.branch
counts.SetSize(2304)
box_counts.append(np.array(counts,copy=True))
So actually my question goes further: there is a faster way to retrieve big amount of data
from a branch?
Thanks in advance
ROOT Version: 5.34.36 Platform: Not Provided Compiler: Not Provided
With AsNumpy, you would get a numpy array for your array branch where every position contains a flat std::vector which has the data for a particular entry of the tree. You could wrap those std::vectors with numpy arrays (arr = np.asarray(vector)) also. This should be more efficient than looping over the events in Python.
Be aware that the array is read back flat and you have to reshape it again to the original shape.
However, consider doing the analysis of the data with RDataFrame and push only what you need as numpy array to Python. That’s way more efficient and also runs natively multithreaded!
Thanks again Stefan!
RDataFrame seems really powerfull and that’s what’s right for me!
I don’t want to bother you anymore but just to understand…why for
you the SetBranchAddress() method didn’t work?
You mean for writing the tree? Not using the SetBranchAddress approach was not on purpose, I’ve just put together a suitable ROOT file to make the point!
FYI: You can check in your files the layout of the branch by opening it root -l filename.root and then writing treename->Print().
Sorry for bothering swunsch, is there a way to collect only certain
Branches of a Tree with RDataFrame? I was reading the guide but didn’t see
anything useful…so…if there is no soultion I won’t loose other time!
thanks again, sorry
I appreciate that you ask! You can just use df.AsNumpy(['name_branch1', 'name_branch2']) to push only a subset of the dataset to numpy. Actually, that’s highly recommended because it’s simply much more efficient just to load what you need.