Using a predefined TH2D in RDataframe to compute new quantiy

Hello,

I have a probability as a function of two variable x,y. Typically stored in a 2d-histogram.
How can I make this available in RDataFrame ?

e.g for each track in the detector (RVec) I would like to get the probability as function of x,y.

I would know how to implement this in a C++ function that I call from RDataFrame, e.g.

RVec Probability( RVec x, RVec y) (

static TH2D* hprob=0;
static bool firstevent=false;
if (firstevent) {
read histogram hprob from a file

firstevent=false
}
int ibin=hprob->FindBin(x,y);
return hprob->GetBinContent(ibin);

)
and the in RDataFrame
d.Define(“prob”,“Probability(x,y)”)
Is this multi-tread safe ? Is there a more clever way to implement this ?

Maybe this can be helpful
There are shorter and clever tricks than the one posted here, for example you can simply make a struct implementing a member being your 2D histo to read and one operator() implementation, or even better maybe and simple to use lambda capturing the histogram as argument

 auto reweight=[& inputHist]( double X, double y){
....
};
node = node.Define( "w", reweight,{ "col1","col2"})

( Personally I prefer to declare a functor class to do this since then I can compile and make dictionary for it and use in pyROOT for other cases I need it)

. The only thing to be careful is that you use only const methods on the TH2D object in the operator() implementation.
(E.g FindFixBin, GetBinContent, check out the doxygen if the methods you use are marked const for your Root version.

1 Like

Hi @tancredi ,

I would also suggest to create a little helper struct, that’s probably the cleanest:

struct ProbEvaluator {
   TH2D *hprob = nullptr;
   ProbEvaluator(TH2D *h) : hprob(h) {}

   // as Renato mentions, this should be thread-safe as long as we limit
   // ourselves to using const methods of TH2D (i.e. we don't modify the TH2D) 
   float operator()(float x, float y) {
       int ibin=hprob->FindBin(x,y);
      return hprob->GetBinContent(ibin);
   }
};

and then

ProbEvaluator p(yourHisto);
df.Define("prob", p, ["x", "y"]);

It would be great if @moneta could confirm that TH2::FindBin and TH2::GetBinContent are indeed thread-safe (but I would otherwise assume they are since they should be read-only operations).

Cheers,
Enrico

To be clearer, and thread safe, what i usually do is

struct ProbEvaluator {
   const TH2D hprob;
   ProbEvaluator(TH2D *h) : hprob(*h) {}
   ProbEvaluator(ProbEvaluator &other) : hprob(other.hprob) {}

   // as Renato mentions, this should be thread-safe as long as we limit
   // ourselves to using const methods of TH2D (i.e. we don't modify the TH2D) 
   float operator()(float x, float y) {
       int ibin=hprob.FindBin(x,y);
      return hprob.GetBinContent(ibin);
   }
};

The compiler should complaint using non const methods with this in principle (enforcing thread safety protections), that’s why i think FindFixBin should be used (or at least i usually do)

In the latest ROOT version also the Interpolate is marked const and GetBinContent probably always has been so. But yes, a confirmation would be useful

Hello,
thank you very much for the replies. How would I do this in python ?
I guess I can compile the C++ code like usual, but how to implement this part

ProbEvaluator p(yourHisto);
df.Define("prob", p);

in python ?
Regards,
Tancredi

It should be just:

p = ROOT.ProbEvaluator(yourHisto)
df.Define("prob", p)

after you declare that C++ struct to ROOT, e.g. with ROOT.gInterpreter.Declare('#include "ProbEvaluator.hxx"')

Cheers,
Enrico

Hi,

TH1::GetBinContent should be thread safe, if you are not using an histogram with the buffer (automatic binning)… Also it is true better using FindFixBin because FindBin can extend the axis for values outside the range, if the axis can be extended.

Lorenzo

1 Like

Dear @eguiraud
I tried your solution but I get stuck declaring it in the RDataFrame.
I attach a root file and my macro.

prob.py (891 Bytes)
pion_probablitity.root (7.1 KB)

#!/usr/bin/env python
#
import ROOT
import os,sys
#include c++ in python
ROOT.gInterpreter.Declare(
""" 
struct ProbEvaluator {
   const TH2D hprob;
   ProbEvaluator(TH2D *h) : hprob(*h) {}
   
   float operator()(float x, float y) {
       int ibin=hprob.FindFixBin(x,y);
      return hprob.GetBinContent(ibin);
   }
};
""")

#
if __name__ == "__main__":
 
 path = "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/"
 filename=path+'GamGam/Data/data_A.GamGam.root'
 print ('filename= ',filename)
 d = ROOT.RDataFrame("mini", filename) 
 #
 file = ROOT.TFile.Open('pion_probablitity.root','READ')
 h = file.Get("hcorrected")
 #h.Print("all")
 # this does not work 
 prob = ROOT.ProbEvaluator(h) 
 d = d.Define('prob', prob)

Can you help ?
Tancredi

Hi @tancredi ,

sorry for the late reply, I was off last week.

The error message is doing its best to tell you what’s wrong, in particular:

runtime_error: 2 column names are required but none were provided and the default list has size 0

means that you should call Define('prob', prob, ['column1', 'column2']) instead of just Define('prob', prob). In other words you have to tell RDataFrame what columns to apply your ProbEvaluator to.

Cheers,
Enrico

Thank you. This works !
Tancredi