Reading the ROOT file

Li_Huang · May 15, 2019, 1:50pm

Hi,
I commonly use Python to read ROOT file, using TROOT.Chain. However this is sequential and single thread, which is not perfect. I wonder if there is a way fulfills the following two benefits ( or two ways and each way can do one thing ):
1) Parallel ( I asked however I failed to make it in PyROOT ), so I can use multi threads to read the ROOT files
2) Finding a ‘‘entry’’ by index, which means I can read directly the n-th entry without read the previous n-1 entries. This is useful when using ROOT file with Neural Network because I need to shuffle the data.

For example the Pytorch dataset can read data in a folder directly with the two benefits I mentioned. Since ROOT Files is somehow a “folder” too there is no way we can’t write something like the Pytorch dataset, I guess…

Thank you!

Best,
Li

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

eguiraud · May 15, 2019, 3:18pm

Hi,
RDataFrame is ROOT’s interface for parallel data analysis and manipulation (requires at least v6.16). You can find several python tutorials here.

However, RDataFrame does not offer a simple way to access entries in random ordering.
The reason is that it’s very, very easy to write very, very slow applications if you read entries from disk in a random order.

A much better approach, whenever viable, is to load your NN training dataset in memory (as a numpy array, for example) and read from RAM, which as its name implies offers efficient random access.
With RDataFrame, you can apply cuts, defined derived quantities and then load everything into numpy arrays in a few lines:

ROOT.ROOT.EnableImplicitMT() # enable ROOT's implicit multi-threading
df = ROOT.ROOT.RDataFrame("treename", "filename.root")
df.Filter("pt > 0").Define("z", "x*y").AsNumpy(columns=["x", "y", "z"])

Hope this helps,
Enrico

system · May 29, 2019, 3:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.