Home | News | Documentation | Download

RDataFrame adding branches from other trees and matching indices

Hi,

This is related to a similar question here: https://root-forum.cern.ch/t/rdataframe-shuffles-events-for-tmva-scoring

I was asking in particular about using Foreach as the way I currently add things like weights from another file/tree is something like this.
In the code below, the indices between the TTrees align so I didn’t have to worry about matching the right weight but I know this won’t always be the case when using Snapshot and MT.

//C++ function to be called by RDataFrame
double copy_vec(std::vector<double> &vec,int index){
   return vec[index]; 
}
//Fill a vector for the weights I want
for (int i =0; i < nentries;i++){
    double nsig_sw = data->get(i)->getRealValue("nsig_sw",0,true);
    nsig_sw_vec.push_back(nsig_sw);
}
//Define a new branch in RDataFrame. Must disable MT for this step
ROOT::DisableImplicitMT();
auto df2 = df.Define("nsig_sw","copy_vec(nsig_sw_vec,rdfentry_)");

This clearly won’t be performant for large datasets since it doesn’t use MT. I will be working with ~100M candidate datasets.
A first step would be to make a 2D vector with the index number and weight rather than the 1D vector currently used so with this information would it be possible to use ForEach with MT?

Thanks


_ROOT Version:6.22.0
Platform: Not Provided
Compiler: Not Provided


Hi,
a 2D vector (or 2 “synchronized” 1D vectors) sound fine. The second vector could contain the entry number of the original TTree to which the value in the first vector corresponds. rdfentry_ could be used for this purpose.

Then you could pre-sort the vector so its elements are ordered by increasing entry number, or look up the vector element that corresponds to the current entry number as you go – either could be more performant/practical depending on the situation.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.