I would like to use something like scikit-hep jagged array to retrieve information from the ROOT trees → dictionary-of-flat numpy-arrays.
- event entry with dynamic array tracks with parameter attributes
- track entry with array of clusters (position charge)
In many use cases we need some non-trivial selection criteria, so I can not use scikit-hep directly.
In : file["seed1"][b"seed.fParamArrayOut.fData"].keys() Out: [b'seed.fParamArrayOut.fData.fUniqueID', b'seed.fParamArrayOut.fData.fBits', b'seed.fParamArrayOut.fData.fX', b'seed.fParamArrayOut.fData.fAlpha', b'seed.fParamArrayOut.fData.fP', b'seed.fParamArrayOut.fData.fC']
Recently I saw a news item about a change to RVec in the next version of ROOT. Is this the development in that direction?
Warning in <TStreamerInfo::Build>: Due to some major, backward-incompatible improvements planned for ROOT::RVec, direct I/O of ROOT::RVec objects will break between v6.24 and v6.26. Please use std::vectors instead. See the release notes of v6.24 for more information.
For the moment investigating 3 options:
- restricted non-derived function
- old ROOT with GetVal - generic - not custom code needed
- works with some problems
- not clear how to achieve jagged arrays
Sometimes I need to access some derived variables. I have tried the old root interface and also RDataFrame.
With the old interface ROOT I used simple queries and later I used GetVal to retrieve the data in a contiguous array.
I created a wrapper that converts such queries into the dictionary of Numpy arrays.
For example in RootInteractive wrapper:
def tree2Panda(tree, include, selection, **kwargs) entries = tree.Draw(str(variables), selection, "goffpara", options["nEntries"], options["firstEntry"]) # query data for i, a in enumerate(columns): val = tree.GetVal(i) ex_dict[a] = np.frombuffer(val, dtype=float, count=entries)
Works quite well, but unfortunately with some limitations:
- TTree::GetVal does not work when I use TEventList or TEntryList.
Using the RDataFrame AsNumpy interface to create a dictionary of Numpy array does not give us what we hope for.
It returns an array of objects (RVec in the above case).
To create a jagged version like in the old root, I guess I’ll have to write my own macro.
- Could we use RDataFrameCache for this purpose?
Or is it possible to get a flat version already with the current root version 6.24.
From what I have read and tried in the tutorials, it seems impossible. Is there an implementation planned in the future?
ROOT Version: v6.24.06