RDataFrame's Count is instable

ROOT Version: 6.24.02
Platform: Fedora 34
Compiler: gcc-11.1


code:

    auto dataFrame = new ROOT::RDataFrame("tree", "event.root");
    auto entryN=static_cast<int>(*(dataFrame->Count()));
    cout<<"entryN : "<<entryN<<endl;
    
    auto filterFun = [&](float incidentEnergy)
    {
        auto iBin=h1Source->FindBin(incidentEnergy);
        double possibility = h1Source->GetBinContent(iBin);
        double random = gRandom->Rndm();
        if (random < possibility)
            return true;
        else
            return false;
    };

    auto dfAfterFilter = dataFrame->Filter(filterFun, {"incidentEnergy"});
    cout<< "counts: "<<dfAfterFilter.Count().GetValue()<<endl;
    cout<< "counts: "<<dfAfterFilter.Count().GetValue()<<endl;
    cout<< "counts: "<<dfAfterFilter.Count().GetValue()<<endl;

result:

entryN: 200000
counts: 93497
counts: 88923
counts: 89011

How to solve? Thanks.

Hi @lishuwei ,
in your snippet above RDataFrame runs a different event loop every time you call GetValue() (so RDF is forced to run the event loop to produce the result). Different event loop means different realizations of the random number sequence produced by gRandom, so the Filter filters different numbers of events.

For that particular snippet you would get the same Count() value if you wrote it like this (but of course this is a bit contrived):

    auto dfAfterFilter = dataFrame->Filter(filterFun, {"incidentEnergy"});
    auto r1 = dfAfterFilter.Count();
    auto r2 = dfAfterFilter.Count();
    auto r3 = dfAfterFilter.Count(); 
    cout<< "counts: "<< r1.GetValue()<<endl;
    cout<< "counts: "<< r2.GetValue()<<endl;
    cout<< "counts: "<< r3.GetValue()<<endl;

Cheers,
Enrico

Thank you! It works.

But I found, the method is failed for Mean() calculation. For example,

auto dfAfterFilter = dataFrame->Filter(filterFun, {"incidentEnergy"});
auto r1 = dfAfterFilter.Mean("incidentEnergy);
cout<<r1.GetValue()<<endl;
cout<<r1.GetValue()<<endl;

How to stop Fliter in every calculation?
Thank you.

Hi @lishuwei ,
I am not sure I understand this latest question. Do you want to not apply the filter when you evaluate the mean? In that case you can just call dataframe.Mean instead of dfAfterFilter.Mean.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.