RooNLLVar problem

cvarni · September 30, 2014, 10:03pm

Dear experts,
I really need your help.

I’m running several TOYMC of a bkg pdf (a 4th-degree Bernstein polynomial -B- or exponential -E-) + a signal (250k events for the bkg and 800 for the signal). The produced data will be fitted with a pdf of the same kind: bkg (B or E) + signal.

I’d like to count how many events my sample contains and in order to do this I fit, for the time being, the binned TOYMC data using RooChi2Var and RooMinimizer (calling migrad, hesse and minos in sequence).

When I switch from RooChi2Var to RooNLLVar I notice that the former gives unbiased results while the latter produces a bias (especially when I fit using Bernstein as bkg model). Due to the high statistics I have in every bin, I was expecting the results to be compatible.

Is this a math problem I’m not noticing, or a problem in the minimizer function, in the pdfs used or the toy-generation algorithm?

This is a simplified model of the situation I’m actually facing. The bias I mention here is not so big, but in my case, where I have two different signals very close and very low (bkg has 100k events while the signals only 100 and 800), this bias dramatically increase.

I’ll put in attachment the program I use and that shows this problem (I’l put as well all the .h and .root files you’ll need and a little pdf with the instruction on how to use it).

Cheers,
Carlo
Instructions.pdf (64 KB)
Ztemplate.root (3.87 KB)
generateSimulation.h (2.05 KB)
Zbias.cpp (10.6 KB)

moneta · October 2, 2014, 9:13am

Hi,

It is normal that the fit od an histogram using the least square method is bias. See for example paragraph 2.5.4 (page 61) of the book “Data Analysis in High Energy Physics”,
amazon.com/Data-Analysis-Hig … 3527410589

RooFit I think uses the Pearson Chi2 so the bias is of the bias in the number of events is of the order of the chi2_value/2.

Best Regards

Lorenzo

cvarni · October 6, 2014, 2:21pm

Thanks for your reply,

What you say is true but the bias you mention should be negligible in case of high statistics in each bin (which is my case). The difference I see between the two results is not explainable only by that.

In addition to this the ChiSquare method is the one which gives UNBIASED results: I obtain the same number of signal events I’ve generated. The problem lies in the RooNLLVar method which highly underestimates the signal events.

In attachment you can find the program which shows this. The RooChi2Var results are in [color=#FF0000]RED[/color], the RooNLLVar output is in [color=#4000BF]BLUE[/color]. The graph I’m interested in is the bottom one (that is my signal!), while on the top canvas there is the contribution from another signal I’m not really interested in.

Thanks,
Carlo
histograms.root (4.25 KB)
generateSimulation.h (2.05 KB)
bias.cpp (7.29 KB)