Create an RDataFrame snapshot with a friend

Hello,

I wonder if the following is possible. I have a ttree t1. I want to make a copy of some columns and add some columns to a new tree t2 with an RDataFrame, using Define and Snapshot. However, I need t1 to be a friend of t2. Is it possible with RDataFrame and Snapshot? If not, I know how to do it with the standard TTree access, but I was thinking about convenience.

Hi,

Why do you want to copy columns if it’s then added as a friend? (I’m just making sure I understand what you’re trying to do here.)

The issue is the filtering: if the Snapshot output has less entries than the input tree then you need to keep the correlation of input / snapshot tree entries yourself. One of the options is to add “empty” entries to the snapshot tree, i.e. to not filter the output.

At least in the context of RNTuple we will be working on a simpler solution for exactly this problem. @eguiraud might share his plans for RDataFrame in this area.

Currently, what works is the following, in single-thread mode:

RDataFrame("t1", "f1.root").Define("x", ...)
                           .Define("y", ...)
                           .Snapshot("t2", "f2.root", {"x", "y"});

auto f1 = std::unique_ptr<TFile>(TFile::Open("f1.root"));
auto t1 = f1->Get<TTree>("t1");
auto f2 = std::unique_ptr<TFile>(TFile::Open("f2.root"));
auto t2 = f2->Get<TTree>("t2");
t1->AddFriend(t2);
ROOT::RDataFrame df(*t1);
// now you have a dataframe that processes t1+t2

If the Snapshot is multi-threaded the entries in the friend will be out-of-order (and recent-enough ROOT versions will actually notice it and complain) and if there are Filters before the Snapshot you incur in the problem Axel describes. In the future we might work around the former problem by providing a multi-thread OrderedSnapshot. TTree “joins” on certain column values using TTreeIndex are also possible, but I think that’s not the case you are asking about.

By interpreting your question differently, maybe you are asking about reading t1+t2 with RDataFrame and, with that same RDataFrame, add more columns or entries to t2 – that’s not supported.

Cheers,
Enrico

I am copying only some columns of the TTree, omitting some, and recalculating some. The resulting TTree will have the same number of entries. It is just the next level of processing, which does not filter out any entries, but removes some not interesting stuff and transforms some results. May be distributed together with the previous level (in this case friends would be needed) or without (thus some columns are copied).

But thanks, I’ve already did it the old way with Branches and looping :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.