Hello
I would like to use something like scikit-hep jagged array to retrieve information from the ROOT trees → dictionary-of-flat numpy-arrays.
E.g.:
- event entry with dynamic array tracks with parameter attributes
- track entry with array of clusters (position charge)
In many use cases we need some non-trivial selection criteria, so I can not use scikit-hep directly.
E.g:
In [59]: file["seed1"][b"seed.fParamArrayOut.fData"].keys()
Out[59]:
[b'seed.fParamArrayOut.fData.fUniqueID',
b'seed.fParamArrayOut.fData.fBits',
b'seed.fParamArrayOut.fData.fX',
b'seed.fParamArrayOut.fData.fAlpha',
b'seed.fParamArrayOut.fData.fP[5]',
b'seed.fParamArrayOut.fData.fC[15]']
Recently I saw a news item about a change to RVec in the next version of ROOT. Is this the development in that direction?
Warning in <TStreamerInfo::Build>: Due to some major, backward-incompatible improvements planned for ROOT::RVec, direct I/O of ROOT::RVec objects will break between v6.24 and v6.26.
Please use std::vectors instead. See the release notes of v6.24 for more information.
For the moment investigating 3 options:
- uproot
- restricted non-derived function
- old ROOT with GetVal - generic - not custom code needed
- works with some problems
- RDataFrame
- not clear how to achieve jagged arrays
Old root
Sometimes I need to access some derived variables. I have tried the old root interface and also RDataFrame.
With the old interface ROOT I used simple queries and later I used GetVal to retrieve the data in a contiguous array.
I created a wrapper that converts such queries into the dictionary of Numpy arrays.
For example in RootInteractive wrapper:
def tree2Panda(tree, include, selection, **kwargs)
entries = tree.Draw(str(variables), selection, "goffpara", options["nEntries"], options["firstEntry"]) # query data
for i, a in enumerate(columns):
val = tree.GetVal(i)
ex_dict[a] = np.frombuffer(val, dtype=float, count=entries)
Works quite well, but unfortunately with some limitations:
- TTree::GetVal does not work when I use TEventList or TEntryList.
RDataFrame
Using the RDataFrame AsNumpy interface to create a dictionary of Numpy array does not give us what we hope for.
It returns an array of objects (RVec in the above case).
To create a jagged version like in the old root, I guess I’ll have to write my own macro.
- Could we use RDataFrameCache for this purpose?
Or is it possible to get a flat version already with the current root version 6.24.
From what I have read and tried in the tutorials, it seems impossible. Is there an implementation planned in the future?
Regards
Marian
Please read tips for efficient and successful posting and posting code
ROOT Version: v6.24.06
Platform: all
Compiler: all