Could I generate several outputs during one calculation in RDataFrame?

Hi experts!

I’m starting to use RDataFrame to do tree analysis.

When I try to do some calculations, I wonder if it is possible to generate several outputs in one Define for calculations?

For example: If I want to calculate both v2 (regarding to mean(cos 2phi)) and mean(pT) using functions like:

    auto calculation = [&m](const ROOT::RVecF &phi,const ROOT::RVecF &eta,const ROOT::RVecF &pT,int npart,int RefMult){
    	double calculation[3]={0.};
    	double pt=0.;
    	double v2=0.;
    	double v4=0.;
            double cos2phi=0.;
            double sin2phi=0.;
            double cos4phi=0.;
            double sin4phi=0.;
            for(int i=0;i<npart;i++)
            	if(isnan(phi[i])) continue;
    	if (RefMult==0) pt=0.;
   		else pt/=(float)RefMult;
   		if(RefMult==0 || RefMult==1) 
   			v2 = 0.;
   			v4 = 0.;
	        if(m==0) return pt;
   		if(m==1) return v2;
   		if(m==2) return v4;

In fact, I could input different integer m to obtain different returning outputs, while is there any way to use one loop and one calculation to return several outputs just like what we used in GetBranchAddress and GetEntry?

Thanks in advance.


Hi @zainingwang,

welcome to the ROOT forum and thank you for your question.

I would guess you would then want separate columns in your RDF for pt, v2 and v4?

We don’t support Define for multiple columns in one call, so you need a separate Define per column. If you want all you quantities in one column, this could be possible.

What exactly is the m here? Do you maybe deal with multiple different samples? Then what you could do would be DefinePerSample call: ROOT: ROOT::RDF::RInterface< Proxied, DataSource > Class Template Reference


Thanks for your help, Marta.

Yes, that’s true. I was trying to obtain separate columns. So it cannot return multiple columns with just one Define.

What I worry about is that if I use several Define, it has to calculate in a loop to return the mean value every time, will it be slower to run? If it doesn’t influence the efficiency, I would like to make separate Define to achieve it.

(Actually, m is just to choose the needed Define I use which is same to use separate Define)

Thanks for your reply,

Hi @zainingwang,

Then in your case you will need to use 3 separate defines. What I could suggest is that you prepare the full macro with those three Defines and how you call them within RDF etc, and then we can check in case there is anything that could be done to improve efficiency? I’m sure you’ve already had a look at tutorials, but this is our newest RDF tutorial which has many defines and still is well performant: ROOT: tutorials/dataframe/df106_HiggsToFourLeptons.C File Reference

Also, if you need a Mean of one of the columns you can also check the RDF Mean function: ROOT: ROOT::RDF::RInterface< Proxied, DataSource > Class Template Reference


Thanks a lot! I will try it.