Issues in the conversion of RDataFrame column into a numpy array

Dear experts,

I am trying to convert a column of a RDataFrame object into a numpy array.

The dataFrame is defined on a memory-resident tree created via TTree.MergeTrees(TList).

However, the AsNumpy function as shown in the tutorial df026_AsNumpyArrays.py does not work. I get the following error message.

    cpp_reference = self._result_ptrs[column].GetValue()
cppyy.gbl.std.runtime_error: const vector<double>& ROOT::RDF::RResultPtr<vector<double> >::GetValue() =>
    runtime_error: The input TTree is not linked to any file, in-memory-only trees are not supported.

I am running ROOTv6.26.07 with Python 3.9.6 in el8_amd64_gcc10 environment.

Is there any other method to do this?

Any help will be highly appreciated.

Best,
Soureek

Can you open a file before calling TTree.MergeTrees(TList) so the resulting tree has a TFile as “home”?

Hi @soureek ,

and welcome to the ROOT forum. In case Axel’s suggestion does not pan out, as a much worse workaround I seem to remember (I hope I remember correctly) that memory-resident trees are not supported with EnableImplicitMT() active. In other words a single-thread run (without EnableImplicitMT()) might work.

Cheers,
Enrico

Hi @eguiraud and @Axel

Both options work for me. Thanks for your suggestions.

I seem to remember (I hope I remember correctly) that memory-resident trees are not supported with EnableImplicitMT() active.

@eguiraud
Is there particular reason why this feature is not supported ?
In my experience, merging small trees on-the-fly is a standard requirement for various studies. One should be able to seamlessly handle them in RDataFrame without deactivating EnableImplicitMT() or creating a temporary file to store the merged tree.

Can this feature be added in RDataFrame?

Cheers,
Soureek

I think, in principle, a TTree is always expected to be “owned” by some file, e.g., either a “physical” TFile (or any “network” resident variant) or a “memory-mapped” TMemFile (that’s maybe what you want now).

Different threads cannot work on the same TTree, it’s not supported (the tree would need to “be at different entries” at the same time). So each thread needs to open its own copy of the file and work on its copy of the TTree. And that’s not possible for memory-resident trees.

@eguiraud @pcanal So, you say that TMemFile cannot be used with RDataFrame (at least not in “multi-threaded” mode).

Yes, as far as I know multi-thread access to a TTree backed by a TMemFile is not something ROOT supports (so RDF can’t do it either).

TTree backed by a TMemFile is not something ROOT supports

In what context? It should work for most uses.

that memory-resident trees are not supported with EnableImplicitMT() active.
Is there particular reason why this feature is not supported ?

The technical reasons is that TFile objects are not thread safe in themselves. In order to support accessing the same file from multiple thread, the simplest way is to create multiple TFile object looking/viewing the same physical file. In the case of a memory file, with the current code, it would require duplicating in memory the entire ‘file’ which too many cases would blow up the memory. Instead we need to develop code so that multiple TMemFile objects can view the same in-memory file and/or develop a thread safe TMemFile. Either way this requires code that we have not developed yet.

The dataFrame is defined on a memory-resident tree created via TTree.MergeTrees(TList).

Why? Why not skip that step and create a TChain and use it with RDataFrame?

The beginning of that sentence was “multi-thread access to a…” :smiley: Or do you mean that should work as well?

The beginning of that sentence was “multi-thread access to a…” :smiley: Or do you mean that should work as well?

Yes, I missed the qualifier and in that my answer is the right explanation. (i.e. Technically ROOT does not support direct multi-thread access to any TFile or TTree. RDF has code to allow multi-thread access to the same on-disk file (using multiple TFile object), doing the same for a TMemFile is not implemented … yet?)