RooDataSets, weighs, createHistogram and RooNDKeysPdf

egholm · June 14, 2013, 12:12pm

Hi.

So basically what I want to do is load a TTree into a RooDataSet with weighed events,
produce a 2d histogram using the createHistogram function and create a RooNDKeysPdf which
I can then compare to the histogram:

The code looks like this (using PyROOT):


#Create a RooDataSet from tree where 'requirement' has been applied
x_var  = ROOT.RooRealVar( invar_x, invar_x, minrange, maxrange )
y_var  = ROOT.RooRealVar( invar_y, invar_y, minrange, maxrange ) 
weight = ROOT.RooRealVar( "weight", "weight", 0.0, 1.0 ) 
 
ds = ROOT.RooDataSet("dataset", "dataset", intree, ROOT.RooArgSet( x_var, y_var, weight ), "", "weight" )

#Using the Print() function for RooDataSet indicates that the events are weighed correct

#Create 2d histogram from parameters
hist = ds.createHistogram( x_var, y_var, nbins, nbins )

hist.Sumw2()
hist.Scale(1.0/hist.Integral())

mvakde = ROOT.RooNDKeysPdf( "mvakde", "mvakde", ROOT.RooArgList( x_var, y_var ), ds, "a", 1.0 )

kdehist = mvakde.createHistogram( "%s,%s"%( invar_x, invar_y), nbins, nbins )

My questions are:

When I produce the histogram using the createHistogram function, the resulting bin errors are extremely large. If I look in the source code, it seems like Sumw2() is not called for the histogram that is returned, is this correct? Or are there any way I can change that?

The “kdehist” that I create does not in any way resemble the histogram I have produced. How does RooNDKeysPdf handle event weights?

If I change the code to ignore the event weights, i.e change:

ds = ROOT.RooDataSet("dataset", "dataset", intree, ROOT.RooArgSet( x_var, y_var, weight ), "", "weight" )

to:

ds = ROOT.RooDataSet("dataset", "dataset", intree, ROOT.RooArgSet( x_var, y_var ) )

Everything works as it should.

Any comment would be highly appreciated.

Thanks a lot, cheers, Lars

wlav · June 14, 2013, 12:58pm

Hi,

this seems like the usual problem in roofit where it likes to take pointers to temporaries. The lifetime in C++ is different from in python, causing different results. Do:argset = ROOT.RooArgSet( x_var, y_var, weight ) ds = ROOT.RooDataSet("dataset", "dataset", intree, argset, "", "weight" )
Cheers,
Wim

egholm · June 17, 2013, 12:00pm

Hi again,

thanks for the reply. Unfortunately it gives the exact same result. I also tried to
reproduce it in a simple script where the C++ version gives the correct results while
the python version has extremely large stat-errors with small event weights.

Cheers,
Lars

wlav · June 17, 2013, 12:05pm

Lars,

[quote=“egholm”]I also tried to reproduce it in a simple script where the C++ version gives the correct results while the python version has extremely large stat-errors with small event weights.[/quote]did that succeed? I.e., do you have such as script? Would be way easier to debug …

Cheers,
Wim

egholm · June 17, 2013, 2:47pm

Hi, yes sure.

I have attached the c++ and python version of the scripts.
If you look at the c++ version I think the problem is around the:

  TH1* hh_data = data2->createHistogram(x,y,20,20);
  //  TH1* hh_data = data2->createHistogram("hh_data", x,Binning(10) ,YVar(y,Binning(10)) );

If I use:

TH1* hh_data = data2->createHistogram(x,y,20,20);

I get the same problem as with python (i.e. large stat errors and very bad comparability between the raw histogram and the kde), but if I instead do:

TH1* hh_data = data2->createHistogram("hh_data", x,Binning(10) ,YVar(y,Binning(10)) );

It seems to work just fine. I have tried a couple of things but I can’t seem to do a similar thing in python (I guess the last way the createHistogram function is called is from RooAbsData instead of RooDataSet)

Cheers, Lars
rf707_kernelestimation.C (2.37 KB)
rf707_kernelestimation.py (1.59 KB)

wlav · June 19, 2013, 11:30am

Lars,

yes, that’s what happens: CINT does not expose “using”. A possible workaround is:hh_data = super(data2.__class__, data2).createHistogram("hh_data", x, RooFit.Binning(20), RooFit.YVar(y, RooFit.Binning(20)))
to grab the base class createHistogram, or the explicit:hh_data = RooAbsData.createHistogram(data2, "hh_data", x, RooFit.Binning(20), RooFit.YVar(y, RooFit.Binning(20)))
HTH,
Wim