Hello,
I am not too experienced with dataframes, and was hoping to also check my understanding. Currently I am using ROOT 6.14. Essentially I was considering what to do when I don’t simply want to filter data but rather divide it into two or more sets over the cut. As an example, say we have a dataframe with columns “p” and “x”. We want to divide it into 3 categories in p and take find the mean “x”, i.e.
float cut_edges[]={0.0,2.0,3.0, 99.0}; //cuts defining 3 categories, 99 used for infinity for simplicity
ROOT::RDataFrame df(inputTree, inputFile, "x");
float x_Mean[3];
Now, one could put this in a for loop e.g.
for(int i=0; i<3; i++){
double lowbound=cut_edges[i];
double highbound=cut_edges[i+1];
auto dfCut=df.Filter([lowbound,highbound](float p){return (p<highbound && p>lowbound);},{"p"});
x_Mean[i]=*dfCut.Mean("x");
}
, but that would require loop over the tree several times (assuming I understand correctly).
On the other hand, writing the for loop line by line instead could be lazily done correct? E.g.
auto dfCut1=df.Filter([cut_edges[0],cut_edges[1]](float p){return (p<cut_edges[1] && p>cut_edges[0]);},{"p"});
auto dfCut2=df.Filter([cut_edges[1],cut_edges[2]](float p){return (p<cut_edges[2] && p>cut_edges[1]);},{"p"});
auto dfCut3=df.Filter([cut_edges[2],cut_edges[3]](float p){return (p<cut_edges[3] && p>cut_edges[2]);},{"p"});
x_Mean[0]=*dfCut1.Mean("x");
x_Mean[1]=*dfCut2.Mean("x");
x_Mean[2]=*dfCut3.Mean("x");
Would this require only one loop through the data?
Now if I wanted an arbitrary number of divisions, this method becomes impractical, and I would think there is a more elegant way to divide up a dataframe. Is there any efficient ways of writing this sort of code? I didn’t see anything quite like it in the tutorials.