# Using a predefined TH2D in RDataframe to compute new quantiy

Hello,

I have a probability as a function of two variable x,y. Typically stored in a 2d-histogram.
How can I make this available in RDataFrame ?

e.g for each track in the detector (RVec) I would like to get the probability as function of x,y.

I would know how to implement this in a C++ function that I call from RDataFrame, e.g.

RVec Probability( RVec x, RVec y) (

static TH2D* hprob=0;
static bool firstevent=false;
if (firstevent) {
read histogram hprob from a file

firstevent=false
}
int ibin=hprob->FindBin(x,y);
return hprob->GetBinContent(ibin);

)
and the in RDataFrame
d.Define(“prob”,“Probability(x,y)”)
Is this multi-tread safe ? Is there a more clever way to implement this ?

There are shorter and clever tricks than the one posted here, for example you can simply make a struct implementing a member being your 2D histo to read and one operator() implementation, or even better maybe and simple to use lambda capturing the histogram as argument

`````` auto reweight=[& inputHist]( double X, double y){
....
};
node = node.Define( "w", reweight,{ "col1","col2"})
``````

( Personally I prefer to declare a functor class to do this since then I can compile and make dictionary for it and use in pyROOT for other cases I need it)

. The only thing to be careful is that you use only const methods on the TH2D object in the operator() implementation.
(E.g FindFixBin, GetBinContent, check out the doxygen if the methods you use are marked `const` for your Root version.

1 Like

Hi @tancredi ,

I would also suggest to create a little helper struct, that’s probably the cleanest:

``````struct ProbEvaluator {
TH2D *hprob = nullptr;
ProbEvaluator(TH2D *h) : hprob(h) {}

// as Renato mentions, this should be thread-safe as long as we limit
// ourselves to using const methods of TH2D (i.e. we don't modify the TH2D)
float operator()(float x, float y) {
int ibin=hprob->FindBin(x,y);
return hprob->GetBinContent(ibin);
}
};
``````

and then

``````ProbEvaluator p(yourHisto);
df.Define("prob", p, ["x", "y"]);
``````

It would be great if @moneta could confirm that `TH2::FindBin` and `TH2::GetBinContent` are indeed thread-safe (but I would otherwise assume they are since they should be read-only operations).

Cheers,
Enrico

To be clearer, and thread safe, what i usually do is

``````struct ProbEvaluator {
const TH2D hprob;
ProbEvaluator(TH2D *h) : hprob(*h) {}
ProbEvaluator(ProbEvaluator &other) : hprob(other.hprob) {}

// as Renato mentions, this should be thread-safe as long as we limit
// ourselves to using const methods of TH2D (i.e. we don't modify the TH2D)
float operator()(float x, float y) {
int ibin=hprob.FindBin(x,y);
return hprob.GetBinContent(ibin);
}
};
``````

The compiler should complaint using non `const` methods with this in principle (enforcing thread safety protections), that’s why i think `FindFixBin` should be used (or at least i usually do)

In the latest ROOT version also the `Interpolate` is marked `const` and `GetBinContent` probably always has been so. But yes, a confirmation would be useful

Hello,
thank you very much for the replies. How would I do this in python ?
I guess I can compile the C++ code like usual, but how to implement this part

``````ProbEvaluator p(yourHisto);
df.Define("prob", p);
``````

in python ?
Regards,
Tancredi

It should be just:

``````p = ROOT.ProbEvaluator(yourHisto)
df.Define("prob", p)
``````

after you declare that C++ struct to ROOT, e.g. with `ROOT.gInterpreter.Declare('#include "ProbEvaluator.hxx"')`

Cheers,
Enrico

Hi,

`TH1::GetBinContent` should be thread safe, if you are not using an histogram with the buffer (automatic binning)… Also it is true better using `FindFixBin` because `FindBin` can extend the axis for values outside the range, if the axis can be extended.

Lorenzo

1 Like

Dear @eguiraud
I tried your solution but I get stuck declaring it in the RDataFrame.
I attach a root file and my macro.

prob.py (891 Bytes)
pion_probablitity.root (7.1 KB)

``````#!/usr/bin/env python
#
import ROOT
import os,sys
#include c++ in python
ROOT.gInterpreter.Declare(
"""
struct ProbEvaluator {
const TH2D hprob;
ProbEvaluator(TH2D *h) : hprob(*h) {}

float operator()(float x, float y) {
int ibin=hprob.FindFixBin(x,y);
return hprob.GetBinContent(ibin);
}
};
""")

#
if __name__ == "__main__":

path = "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/"
filename=path+'GamGam/Data/data_A.GamGam.root'
print ('filename= ',filename)
d = ROOT.RDataFrame("mini", filename)
#
h = file.Get("hcorrected")
#h.Print("all")
# this does not work
prob = ROOT.ProbEvaluator(h)
d = d.Define('prob', prob)

``````

Can you help ?
Tancredi

Hi @tancredi ,

sorry for the late reply, I was off last week.

The error message is doing its best to tell you what’s wrong, in particular:

``````runtime_error: 2 column names are required but none were provided and the default list has size 0
``````

means that you should call `Define('prob', prob, ['column1', 'column2'])` instead of just `Define('prob', prob)`. In other words you have to tell RDataFrame what columns to apply your `ProbEvaluator` to.

Cheers,
Enrico

Thank you. This works !
Tancredi