RDataFrame, pass the result of Filter() as the argument of a function

andrea.celentano · January 23, 2023, 9:49am

Dear colleagues,
I am trying to write the code for an analysis using RDataFrames. The input of my analysis is a ROOT File with a ROOT Tree containing multiple branches.

My goal is the following. I’d like to apply a series of Filters on the RDataFrame, and after each filter create a set of histograms. The histograms are always the same. Therefore, I thought I could use a function to create them, passing as argument the result of the Filter operation.

The function would look like:

std::vector::<ROOT::RDF::RResultPtr<TH1>> createHisto(int idx, some_type_that_I_do_not_know df){
    std::vector::<ROOT::RDF::RResultPtr<TH1>> ret;
    ret.push_back(df.Histo1D("name_of_a_column"));
    ret.push_back(df.Histo2D("name_of_a_column","name_of_another_column"));
    [...]
    return ret;
}

The problem that I am stuck on is the following. To filter the RDataFrame, I use different functions, with different arguments, depending on which columns they operate on. For example:

bool filterFun1(double a,double b);
bool filterFun2(int i);

In the code, I’d like to do something like:

ROOT::RDataFrame df("tout","inputFile.root");

auto f1=df.Filter(FilterFun1,{"column_a","column_b"});
auto h1=createHisto(1,f1);

auto f2=f1.Filter(FilterFun2,{"column_i"});
auto h2=createHisto(2,f2);

However, this does not work because f1 and f2 are two different types (pardon me if the word is not correct) - i.e. two different “flavours” of the template RInterface. In other words, the problem I see is associated with the second argument of the createHisto function.

Is there any way to solve this problem? A solution would be to have the argument list of my filter functions to be always the same, taking all columns of the TTree, but maybe this is not the best choice…

Thanks,
Bests,
Andrea

ROOT Version: 6.24.06
Platform: CentOs7
Compiler: gcc 11.2.0

couet · January 23, 2023, 9:52am

May be @vpadulan can help with this.

vpadulan · January 23, 2023, 10:47am

Hi @andrea.celentano ,

Thanks for reaching out. What you are looking for is explained in the docs.

Something like

auto createHisto(ROOT::RDF::RNode df){
    std::vector::<ROOT::RDF::RResultPtr<TH1D>> ret;
    ret.reserve(NHISTOS);
    ret.emplace_back(df.Histo1D("name_of_a_column"));
    [...]
    return ret;
}

Cheers,
Vincenzo

andrea.celentano · January 23, 2023, 11:29am

Dear @vpadulan ,
thanks for your quick and clear reply.

This works exactly as expected!

Cheers,
Andrea

system · February 6, 2023, 11:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.