Add ordered branch to tree in RDataFrame

Dear experts,
I’m currently manipulating a tree, getting an RDF object by doing

self.RDF = rt.RDataFrame(self.Tree)

Then I convert it into a pandas df:

temp_numpy = self.RDF.AsNumpy(columns=self.m_VarNames)
self.df = pd.DataFrame(temp_numpy, columns=self.m_VarNames)

Since I have the rt.EnableImplicitMT(); setting on, the rows in the pandas df will be shuffled wrt the original tree:

+-----+-------------+
| Row | eventNumber |
+-----+-------------+
| 0   | 4888        |
+-----+-------------+
| 1   | 1403        |
+-----+-------------+
| 2   | 1139        |
+-----+-------------+
| 3   | 1576        |
+-----+-------------+
| 4   | 4901        |
+-----+-------------+

While the df:

      eventNumber
0          261857
1          261906
2          263667
3          261016
4          261378

Now, I end up with an array that I would like to add to the nominal tree in the TFile, and I would like each element of the array to be matched to the correct entry in the tree according to eventNumber.

Is there a way in which I can snapshot the new array as a new branch making also sure to match the correct eventNumber?

Up to know I snapshot a new tree in the TFile, and then I do the folllowing
using AddFriend:

f = TFile(...)
<do lots of stuff>
t = f.Get("nominal")
tf = f.Get("the new tree I produce")
t.BuildIndex("eventNumber")
tf.BuildIndex("eventNumber")
t.AddFriend(tf)

I was wondering if there is a more efficient way…

Thank you!

Hi @Giovanni_Guerrieri ,

and welcome to the ROOT forum!

I think other than using a TTreeIndex like you are doing the other available method is simply turning off multi-threading for the RDF that performs the Snapshot.

Multi-thread event loops go through the dataset in (“block-wise”) random order and at least at the moment there is no way to have Snapshot re-order the blocks on the fly before it writes them out.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.