According to a test script, which aims to validate that it works
void test(){
ROOT::RDataFrame df("DecayTuple","/eos/lhcb/wg/RD/RKstar/tuples/v9/RK/TupleProcess_EE_L0_BDT-DTF_PIDMeerkat_nTracks_sPlotCut_BS/Bu2KJPsEE/MC12MD/0/TupleProcess.root");
auto dd = df.Range(10000);
auto customCalc = [] ( vector<double> & a , vector<double> & b , vector<int> & c, double d){
vector<double> cc; cc.resize(100);
for( int i=0; i < 100 ; ++i){
cc[i] = (a[i] / b[i] ) * c[i] * d;
}
return cc;
};
auto c1c = dd.Define( "testBR", "(Bp_wfL0I_incl_Bp_effCL_BS/Bp_wfL0I_incl_Bp_effMC_BS) * RndPoisson * wPIDCalib" )
.Define( "testBR2",customCalc, {"Bp_wfL0I_incl_Bp_effCL_BS","Bp_wfL0I_incl_Bp_effMC_BS", "RndPoisson", "wPIDCalib"} );
auto cc = c1c.Take< ROOT::VecOps::RVec<double> > ("testBR");
auto cc2 = c1c.Take< vector<double> > ("testBR2");
vector<double> sumW , sumW_valid;
sumW.resize(100); for(int bsIDX = 0; bsIDX<100; ++bsIDX){ sumW[bsIDX] =0; }
sumW_valid.resize(100); for(int bsIDX = 0; bsIDX<100; ++bsIDX){ sumW_valid[bsIDX] =0; }
for( auto i = 0 ; i < cc->size(); ++i){
for( int bsIDX = 0; bsIDX<100; ++bsIDX){
sumW[bsIDX] += cc->at(i)[bsIDX];
}
}
for( auto i = 0 ; i < cc2->size(); ++i){
for( int bsIDX = 0; bsIDX<100; ++bsIDX){
sumW_valid[bsIDX] += cc2->at(i)[bsIDX];
}
}
TFile * ff = new TFile("test.root","RECREATE");
auto tt = TNtuple("test","test","test:valid");
for( int i = 0 ; i < 100; i++){
cout<< "sumW["<<i<<"] = " << sumW[i] << endl;
cout<< "sumWValid["<<i<<"] = " << sumW_valid[i] << endl;
tt.Fill( sumW[i], sumW_valid[i]);
}
tt.Write();
ff->Close();
// GetWeight() --> GetWeightBS(idx = 0);
// dd.Define("(Bp_wfL0I_incl_B0_effCL_BS/Bp_wfL0I_incl_B0_effMC_BS)*wPIDCalib")
}
i do get the same outcome with the custom operation definitions reading the vector columns and multiplying them ( or dividing them ) . I am very happy to see this to work out of the box, i am just therefore asking if there is any limit in using this, or i shoudl expect all operations ( *, / , - , +) to be working fine when dividing, multiplying, subtracting, adding vectors to vectors and/or vectors to scalar columns
Hi @eguiraud, thanks a lot for the link, nevertheless i failed to find where the RVec<double> ::operator (* ,/,-,+)
Is defined and what is happening when doing
i failed to find where the RVec<double> ::operator (* ,/,-,+) Is defined
Ah that’s true, Doxygen does not pick up the operator definitions for RVecs (this is now ROOT-10865).
Note however that to quickly clarify these small things in case the docs do not help (and we should fix the doc nonetheless) you can simply try them out at the ROOT prompt.
RVec/RVec operations are applied element-by-element and throw an exception if the sizes of the two RVecs are different. RVec/scalar operations “broadcast” the scalar to the size of the RVec, and then act like RVec/RVec operations.
myV1 * myV2 / scalar, as per the usual operator precedence rules, is equivalent to (myV1 * myV2) / scalar.
Thanks a lot for this very useful information. It does exactly what i need.
Without going off-topic, i am now looking if RDataFrame already implements a calculation of the covariance( column1, column2) . I have not found this, but maybe i can simply write a functor for that.
Let me explain why i need this and you can judge how useful it can be.
Basically i am bootstrapping the simulation sample and the corrections to simulation.
Say i made 100 bootstrapping, and on the same sample i compute efficiencies for one trigger category and another one.
When i run the final fit, i use the efficiencies to extract the final measurement directly.
In order to account for correlations among efficiencies, i need to evaluate the “efficiency and its error” plus the covariance between the efficiencies.
So my technical solution is now to basically compute on the sample the n-efficiencies i need in the 100 bootstrapped slots, save them to a TTree with 100 entries and branches being the efficiency values measured. Then i use the “columns” to compute the covariance matrix of ( eff-i, eff-j ).
I see what i can do and how easy can be to implement such function in RDataFrame. Maybe my use case is too specific and might be of little interest for others.