Modification of a RDataFrame with an extracted column

Hi everyone,

I am having a problem using RDataFrame with two separated datasets.
I’m working with a first dataset from a Geant4 simulation and a second one which is a reconstruction of a detector response to the simulation.
The issue is that not all the rows from the first one are recorded in the second one. I only have a row called cycleNum to make the bridge. So I cannot use TTree.AddFriend magic trick.

RDataFrame lazy evaluation and reference system still confusing for me, but I think the issue came from this behavior;

void test() {
    ROOT::RDF::RNode df = ROOT::RDataFrame (10);
    df = df.Define("dummy", "return ROOT::RVecD {};");
    auto res_ptr = df.Take<ROOT::RVecD>("dummy");
    res_ptr.GetPtr()->at(0) = ROOT::RVecD {1222, 222, 444, 4444, 4444, 444};
    for (auto&& value: res_ptr) {
        std::cout << value<< std::endl;
    }
    df.Display("dummy")->Print();
}

The code basically create an empty data frame with a dummy column full of empty vector.
Then it takes a RResult_ptr of this column, and assign to it a value in the first row.
The expected behavior is that the edited column correspond to the data frame row, but even if the for loop show that something was indeed updated, the Display method show that the data frame is unchanged.

The only solution that I see is to evaluate the define column and re-inject it in the data frame.
But It seems to be an ugly solution.
If there is a better solution in order to implement that, It would be amazing.

Thanks in advance! :slight_smile:

Greetings


_ROOT Version: 6.26/00 (conda environment)
_Platform: wsl2 (Ubuntu)
Compiler: Not Provided


Hi @zazbone ,
and welcome to the ROOT forum. Manually modifying the contents of an RDF like that does not work, imagine the RDF was reading the data from some files somewhere on the web – you are modifying a local variable, but the next time RDF runs the event loop it will still read the data from the files on the web.

If cycleNum provides some connection between entries in the two datasets, you might be able to use an indexed TTree friend (EDIT: some info at ROOT: TTreeIndex Class Reference ) to create a combined dataset. Can you please describe the schema of TTree A and TTree B and how they are “connected”?

Cheers,
Enrico

Hi eguiraud,

Basically, in TTree A one row correspond to one simulated event. It’s the same in Tree B except that some events are missing and cycleNum column contains an integer that refer to a row number of TTree B

TTree A TTree B
row data row cycleNum
1 1 1
2 (don’t exist) (don’t exist)
3 2 3
4 3 4
5 (don’t exist) (don’t exist)

I will look at tree index thanks
And thanks for your quick response

greetings

Uhm @pcanal does TTreeIndex support this case (one column in tree B tells which event in tree A should be read for each event in tree B)?

If I understood correctly, yes this is a common case.

Note that tree A does not have a cycleNum column (right @zazbone ?).

How would you write the BuildIndex and AddFriend invocations @pcanal ?

If ‘cyclenum’ is actually refering to the implicit entry number, then we can still pull it off but it (the index) will of course only really work for one tree (+ friend) at a time. The trick is to recrease an Alias so that the cyclenum is matched with the TTreeFormula variable LocalEntry$:

friend->BuildIndex("cyclenum","");
main->SetAlias("cyclenum", "LocalEntry$");
main->AddFriend(friend);
1 Like

This solution seems to work well :smile:
Except that BuildIndex does not work with empty string, apparently. So by taking the default argument, it works

S2tree->BuildIndex("cycleNum");
G4tree->SetAlias("cycleNum", "LocalEntry$");
G4tree->AddFriend(S2tree.get());
ROOT::RDF::RNode df = ROOT::RDataFrame {*G4tree};

And it’s good to know for LocalEntry, too bad it doesn’t work in dataframe expressions :sweat_smile:

Thanks a lot for your responses !