RDataFrame, dealing with std::vector<> columns in Define operations. Is that the same as python arrays?

Hi all,
My question is quite simple.
Let’s say i have 3 branches in my ttree i am processing with RDataFrame:

 weight1_BOOTS = (vector<double>*)0x374f2d0 <(size = 100)
 weight2_BOOTS = (vector<double>*)0x33c7f80 <(size = 100)
 RndPoisson      = (vector<int>*)0x33cf110 <(size = 100)
weigh3 = double 

I want basically to get a new column via

df.Define( "testBR", "(weight1_BOOTS / weight2_BOOTS) *RndPoisson * weigh3 ")

I have managed to get this running and dumping a new tuple which seems to have “testBR” with size 100.

Neverthelss, ROOT prompt tells me that this branch is :

 testBR          = (vector<double,ROOT::Detail::VecOps::RAdoptAllocator<double> >*)0x338c310

Therefore, my question, how is RDataFrame Define operation dealing with stuff like

(vector / vector) * constant " ? 

Is it dealing with vector-like branches as in python ?
I.e is it doing

v1[0]/v2[0] * constant ? 

I hope my question is clear enough and thanks for any feedback.
Renato

According to a test script, which aims to validate that it works

void test(){

    ROOT::RDataFrame df("DecayTuple","/eos/lhcb/wg/RD/RKstar/tuples/v9/RK/TupleProcess_EE_L0_BDT-DTF_PIDMeerkat_nTracks_sPlotCut_BS/Bu2KJPsEE/MC12MD/0/TupleProcess.root");
    auto dd = df.Range(10000);

    auto customCalc = [] ( vector<double> & a , vector<double> & b , vector<int> & c, double d){
        vector<double> cc; cc.resize(100);
        for( int i=0; i < 100 ; ++i){
            cc[i] = (a[i] / b[i] ) * c[i] * d;
        }
        return cc;
    };
    auto c1c = dd.Define( "testBR",  "(Bp_wfL0I_incl_Bp_effCL_BS/Bp_wfL0I_incl_Bp_effMC_BS) * RndPoisson *  wPIDCalib" )
                 .Define( "testBR2",customCalc, {"Bp_wfL0I_incl_Bp_effCL_BS","Bp_wfL0I_incl_Bp_effMC_BS", "RndPoisson", "wPIDCalib"} ); 
    auto cc  = c1c.Take< ROOT::VecOps::RVec<double> > ("testBR");
    auto cc2 = c1c.Take< vector<double> > ("testBR2");

    vector<double> sumW , sumW_valid; 
    sumW.resize(100); for(int bsIDX = 0; bsIDX<100; ++bsIDX){ sumW[bsIDX] =0; }
    sumW_valid.resize(100); for(int bsIDX = 0; bsIDX<100; ++bsIDX){ sumW_valid[bsIDX] =0; }

    for( auto i = 0 ; i < cc->size(); ++i){
        for( int bsIDX = 0; bsIDX<100; ++bsIDX){
            sumW[bsIDX] += cc->at(i)[bsIDX];
        }
    }
   for( auto i = 0 ; i < cc2->size(); ++i){
        for( int bsIDX = 0; bsIDX<100; ++bsIDX){
            sumW_valid[bsIDX] += cc2->at(i)[bsIDX];
        }
    }    
    TFile * ff = new TFile("test.root","RECREATE");
    auto tt = TNtuple("test","test","test:valid");
    for( int i = 0 ; i < 100; i++){
        cout<< "sumW["<<i<<"] = " << sumW[i] << endl;
        cout<< "sumWValid["<<i<<"] = " << sumW_valid[i] << endl;
        tt.Fill( sumW[i], sumW_valid[i]);
    }
    tt.Write();
    ff->Close();

    // GetWeight() --> GetWeightBS(idx = 0);
    // dd.Define("(Bp_wfL0I_incl_B0_effCL_BS/Bp_wfL0I_incl_B0_effMC_BS)*wPIDCalib")

}

i do get the same outcome with the custom operation definitions reading the vector columns and multiplying them ( or dividing them ) . I am very happy to see this to work out of the box, i am just therefore asking if there is any limit in using this, or i shoudl expect all operations ( *, / , - , +) to be working fine when dividing, multiplying, subtracting, adding vectors to vectors and/or vectors to scalar columns

Hi,
RDF implicitly transforms vectors to RVecs when it reads them. In those Define expressions, you are operating on RVecs. More docs here: https://root.cern/doc/master/classROOT_1_1VecOps_1_1RVec.html .

Most arithmetic operations that you expect to be there are defined, plus a few useful helpers.
Cheers,
Enrico

Hi @eguiraud, thanks a lot for the link, nevertheless i failed to find where the
RVec<double> ::operator (* ,/,-,+)
Is defined and what is happening when doing

RVec<double> *(or any operation) RVec<double> 

vs

RVec<double> *(or any operation) (double)

I.e i don’t find the basic operators definitions.

Better,
the only thing i find in the code are those :

 #if (_VECOPS_USE_EXTERN_TEMPLATES)
 
 #define RVEC_EXTERN_UNARY_OPERATOR(T, OP) \
    extern template RVec<T> operator OP<T>(const RVec<T> &);
 
 #define RVEC_EXTERN_BINARY_OPERATOR(T, OP)                                     \
    extern template auto operator OP<T, T>(const T &x, const RVec<T> &v)        \
       -> RVec<decltype(x OP v[0])>;                                            \
    extern template auto operator OP<T, T>(const RVec<T> &v, const T &y)        \
       -> RVec<decltype(v[0] OP y)>;                                            \
    extern template auto operator OP<T, T>(const RVec<T> &v0, const RVec<T> &v1)\
       -> RVec<decltype(v0[0] OP v1[0])>;
 
 #define RVEC_EXTERN_ASSIGN_OPERATOR(T, OP)                           \
    extern template RVec<T> &operator OP<T, T>(RVec<T> &, const T &); \
    extern template RVec<T> &operator OP<T, T>(RVec<T> &, const RVec<T> &);
 
 #define RVEC_EXTERN_LOGICAL_OPERATOR(T, OP)                                 \
    extern template RVec<int> operator OP<T, T>(const RVec<T> &, const T &); \
    extern template RVec<int> operator OP<T, T>(const T &, const RVec<T> &); \
    extern template RVec<int> operator OP<T, T>(const RVec<T> &, const RVec<T> &);

Nevertheless i am failing to understand if the Jit definition i do is doing what I want on RVec<double> objects.

I.e , for example

1) OUT1 RVec<double>(myV1) * RVec<double>(myV2) / double(myD)
2) OUT2 RVec<double>(myV1) * RVec<double>(myV2)  -  double(myD)

Is this making a

RVec<double> OUT 
//using the logic of : 
OUT1[i] = myV1[i] * myV2[i] / myD 
OUT2[i] = myV1[i] * myV2[i] - myD 

?

i failed to find where the RVec<double> ::operator (* ,/,-,+) Is defined

Ah that’s true, Doxygen does not pick up the operator definitions for RVecs (this is now ROOT-10865).

Note however that to quickly clarify these small things in case the docs do not help (and we should fix the doc nonetheless) you can simply try them out at the ROOT prompt.

RVec/RVec operations are applied element-by-element and throw an exception if the sizes of the two RVecs are different. RVec/scalar operations “broadcast” the scalar to the size of the RVec, and then act like RVec/RVec operations.

myV1 * myV2 / scalar, as per the usual operator precedence rules, is equivalent to (myV1 * myV2) / scalar.

Cheers,
Enrico

2 Likes

Thanks a lot for this very useful information. It does exactly what i need.
Without going off-topic, i am now looking if RDataFrame already implements a calculation of the covariance( column1, column2) . I have not found this, but maybe i can simply write a functor for that.

Uhm no I don’t think we have it. If you think it’s generally useful I’d be glad to accept a PR that adds it (with a corresponding test :smiley: ).

Cheers,
Enrico

1 Like

Let me explain why i need this and you can judge how useful it can be.
Basically i am bootstrapping the simulation sample and the corrections to simulation.
Say i made 100 bootstrapping, and on the same sample i compute efficiencies for one trigger category and another one.
When i run the final fit, i use the efficiencies to extract the final measurement directly.
In order to account for correlations among efficiencies, i need to evaluate the “efficiency and its error” plus the covariance between the efficiencies.

So my technical solution is now to basically compute on the sample the n-efficiencies i need in the 100 bootstrapped slots, save them to a TTree with 100 entries and branches being the efficiency values measured. Then i use the “columns” to compute the covariance matrix of ( eff-i, eff-j ).
I see what i can do and how easy can be to implement such function in RDataFrame. Maybe my use case is too specific and might be of little interest for others.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.