Excluding fit region

zwe32 · June 21, 2010, 3:40pm

Dear rooters (and especially Wouter)!

I’m still trying to understand the source of problems mentioned in my previous message:
http://root.cern.ch/phpBB3//viewtopic.php?f=15&t=9318
Unfortunately I cant provide minimal example of my program which gives bad fit when I exclude some interval from chi-2:

t.setRange("R1",lo_ch,excl_lo_ch); t.setRange("R2",excl_up_ch,up_ch); RooChi2Var chi2("chi2","chi2",dec_bkg,data,Range("R1,R2"),NumCPU(2));

Now I simplify my test-case.

When I run attached program with minimization of chi2 at full range (no rejected points, Range(“full”)) i get some results:

full range FCN=336.926 FROM HESSE STATUS=OK 23 CALLS 284 TOTAL EDM=2.40462e-07 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER INTERNAL INTERNAL NO. NAME VALUE ERROR STEP SIZE VALUE 1 bg 3.19504e-02 1.69618e-03 1.28007e-05 -1.21137e+00 2 c_1 5.90994e+00 5.24812e-02 3.60408e-06 -1.12163e+00 3 t_shift 1.09368e+02 5.73925e-02 2.20431e-06 -9.48683e-01 4 tau 1.28628e+01 9.85724e-02 2.71604e-06 -8.37159e-01

When I make 2 intervals that cover full range, i get output that differs from case (1)
t.setRange(“sig1”,100,130) ;
t.setRange(“sig2”,130,200) ;

2 ranges eq to full range FCN=336.926 FROM HESSE STATUS=OK 23 CALLS 286 TOTAL EDM=3.77207e-09 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER INTERNAL INTERNAL NO. NAME VALUE ERROR STEP SIZE VALUE 1 bg 4.38035e-02 2.33203e-03 1.48954e-05 -1.14909e+00 2 c_1 5.90993e+00 5.24812e-02 3.59880e-06 -1.12163e+00 3 t_shift 1.09368e+02 5.73925e-02 2.20167e-06 -9.48684e-01 4 tau 1.28628e+01 9.85727e-02 2.67833e-06 -8.37159e-01

Note: numbers are changing when i set “150” instead of “130” as middle point.

When i exclude 2 points from full interval, all numbers are similar to (2)
t.setRange(“sig1”,100,128) ;
t.setRange(“sig2”,130,200) ;

2 ranges eq to full range - 2 points -- same as above FCN=336.926 FROM HESSE STATUS=OK 23 CALLS 286 TOTAL EDM=3.77207e-09 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER INTERNAL INTERNAL NO. NAME VALUE ERROR STEP SIZE VALUE 1 bg 4.38035e-02 2.33203e-03 1.48954e-05 -1.14909e+00 2 c_1 5.90993e+00 5.24812e-02 3.59880e-06 -1.12163e+00 3 t_shift 1.09368e+02 5.73925e-02 2.20167e-06 -9.48684e-01 4 tau 1.28628e+01 9.85727e-02 2.67833e-06 -8.37159e-01

Is it possible to understand, why the results of (1) and (2) are different though fitting occurs on the same range? And why the results of (2) are dependent on point that breaks full interval? I would rather expect difference between (2) and (3), but not btw. (1) and (2).
Does RooFit use some another technique in (2) and (3) than in (1)?
What are other possibilities to exclude points either from RooChi2Var or from data histogram (RooDataHist), or to make their contribution to chi2 eq. to 0?

I stress that in this example difference is negligibly small, but in my “big” program it’s much bigger.
Nevertheless I can notice similar behavior – introducing Range() in chi2 affects the rate of background component (see “bg” in (1) and (2)).

I’m greatly confused about this and would appreciate any help. Thanks.
testdecay.c (2.14 KB)

Wouter_Verkerke · June 28, 2010, 8:05am

Hi Dmitry,

Let me clarify a few points on how things work internally in RooFit in fits with multiple ranges.

RooFit always works with (normalized) pdfs, both for ML fits and chi^2 fits. For the latter
the pdf is multiplied by Ndata before it goes into the chi^2 calculation.
If you do a fit to multiple ranges, what really happens is that you make a simultaneous fit to the two regions, i.e. the NLL or chi^2 function is written for each region, then these are summed and minimized together.

There may still be non-trivial interactions between the regions, as e.g. a change of shape in region 1 has an effect in region 2 as well. There is an additional issue with this approach that the information contained in the ratio of the normalization is not used, which for certain pdfs may be important.

If this is the case, your fit may depend relatively strongly on the range fitted. (Imagine e.g fitting an exponential decay in two very narrow ranges that are far apart. Each range will be able to constrain the slope very poorly since the interval is narrow, but the ratio of the event rate in the two regions is very sensitive to the slope parameters)

Perhaps this also affect your fit to some extent.

The upcoming ROOT release 5.27/04 has new code in it that allows to fits with multiple disjoint ranges
in an alternate way :it can split the calculation of the normalization of the pdf, e.g. you can define a pdf
as F(x) = f(x) / [ Int[A] f(x) dx + Int[B] f(x) dx ]. This may result in a more stable fit, especially for situations where there is substantial information contained in the relative normalization of the regions.
If you are interested to try this feature, I will post some instructions once release is out (this week)

Wouter