Hi all,
thanks for the additional informations. I am now able to save a filtered Snapshot, which is indeed quite fast. I also like the new docstrings of AsNumpy()
. I attached two docstrings to show what I mean ;).
With the aim to completely be independent of root_numpy
and uproot
(but still wanting to use pandas DataFrames from time to time), I am still struggeling a bit with the options. Is it possible,
- to receive
numpy
-arrays based on a regular expression? Like you mentioned, it works when I want to Snapshot some data, but it did not work together with AsNumpy()
- to read Array-based variables? When trying to read some of then, I get the following error:
Error in <TBranch::TBranch>: Illegal leaf: B0_ARRAY_M/B0_ARRAY_M[B0_ARRAY_nPV]/F. If this is a variable size C array it's possible that the branch holding the size is not available.
*** Break *** segmentation violation
Regards,
Timon
Appendix:
In [12]: rdf.Snapshot?
Call signature: rdf.Snapshot(*args, **kwargs)
Type: TemplateProxy
String form: <ROOT.TemplateProxy object at 0x7fda37743f48>
File: ~/.conda/envs/myroot/lib/python3.6/site-packages/ROOT.py
Docstring:
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, const vector<string>& columnList, const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions())
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, basic_string_view<char,char_traits<char> > columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions())
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, initializer_list<string> columnList, const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions())
Class docstring: PyROOT template proxy (internal)
In [13]: rdf.AsNumpy?
Signature: rdf.AsNumpy(columns=None, exclude=None)
Docstring:
Read-out the RDataFrame as a collection of numpy arrays.
The values of the dataframe are read out as numpy array of the respective type
if the type is a fundamental type such as float or int. If the type of the column
is a complex type, such as your custom class or a std::array, the returned numpy
array contains Python objects of this type interpreted via PyROOT.
Be aware that reading out custom types is much less performant than reading out
fundamental types, such as int or float, which are supported directly by numpy.
The reading is performed in multiple threads if the implicit multi-threading of
ROOT is enabled.
Note that this is an instant action of the RDataFrame graph and will trigger the
event-loop.
Parameters:
columns: If None return all branches as columns, otherwise specify names in iterable.
exclude: Exclude branches from selection.
Returns:
dict: Dict with column names as keys and 1D numpy arrays with content as values
File: ~/.conda/envs/myroot/lib/python3.6/site-packages/ROOT.py
Type: method