Inconsistent results between RooMinuit and RooAbsPdf.fitTo()

mattbellis · September 29, 2011, 6:27pm

Hi all,

I’m finding inconsistent results when I fit to disconnected subranges whether I use RooAbsPdf.fitTo() or RooMinuit/migrad. The attached code should demonstrate this issue. I was also using this code to test another issue which has been posted at

I’m using ROOT 5.31/01 on a Debian 64-bit machine. The attached code allows you to switch between different subranges, as well as choose to fit with either fitTo() or RooMinuit.

I create a 2D PDF, let’s say in energy and time. There are three components to the PDF.

A Gaussian in energy, exponential decay in time.
A Gaussian in energy, exponential decay in time.
An exponential in energy, no time dependence.

The ranges for the variables are:

energy: 0-12 (arbitrary units)
time: 1-500 (arbitrary units)

I’m going to try three different fits to three different ranges.

The full range of the dataset.
Two disconnected subranges: full range in energy and then from (1-200) and (400-500) in time. So for this subrange, I’ve cut out a bit swath in time.
Two disconnected subranges: full range in energy and then from (1-400) and (401-500) in time. So this subrange should be almost exactly the same as the full range. I’ve only cut out 0.2% of the data.

Whenever I fit, I create (I think) a reduced dataset consisting of either the full range or the two disconnected subranges. I used this reduced dataset in the fit.

When I plot the PDF after the fit over the data, I was experimenting with the plotting options, Range and NormRange. So for each fit, I overlay the PDF on the energy and time projections of the data with 4 different combinations of these options. I know that some of these should definitely not work when I overlay on top of the reduced datasets.

No additional range options.
Range: FULL, NormRange: FULL.
Range: subranges, NormRange: FULL.
Range: subranges, NormRange: subranges.

I use these options for both the energy and time plots. When I run these fits, I set the seed for the random generation, so successive tests should yield the same results. You can call the different fits to the three subranges by loading in this macro and then passing in 0, 1 or 2 as the first option.

I’m fixing all components to what was used in the data generation except for the numbers of events in each of the three components. You can pass in 0,1, or 2 as your first option to choose among the three subranges and pass in 0 (fitTo()) or 1 (RooMinuit) on the second option to select the different minimization methods. For example.

root -l
] .L test_two_gaussians_and_exponentials_1_sub_ranges.C
] test_two_gaussians_and_exponentials_1_sub_ranges(0,0) // Full range, fitTo()
] test_two_gaussians_and_exponentials_1_sub_ranges(0,1) // Full range, RooMinuit
] test_two_gaussians_and_exponentials_1_sub_ranges(1,1) // Subrange 1, RooMinuit

Let’s see what I get:

Full Range:
fitTo()

[code] Floating Parameter InitialValue FinalValue +/- Error GblCorr.

                n0    1.0000e+03    1.0193e+03 +/-  3.58e+01  <none>
                n1    5.0000e+02    4.8585e+02 +/-  2.30e+01  <none>
                n2    1.0000e+03    9.9490e+02 +/-  3.63e+01  <none>

num entries in dataset: 2500
fit results:
n0: 1019.25
n1: 485.85
n2: 994.90
total num events from fit: 2500.00
difference between fit and num entries: -0.00[/code]

RooMinuit

[code] Floating Parameter InitialValue FinalValue +/- Error GblCorr.

                n0    1.0000e+03    1.0193e+03 +/-  3.58e+01  <none>
                n1    5.0000e+02    4.8585e+02 +/-  2.30e+01  <none>
                n2    1.0000e+03    9.9490e+02 +/-  3.63e+01  <none>

num entries in dataset: 2500
fit results:
n0: 1019.25
n1: 485.85
n2: 994.90
total num events from fit: 2500.00
difference between fit and num entries: -0.00[/code]

Awesome! I get the same results!

Subranges with large disconnected region in one variable, but not the other.
fitTo()

[code] Floating Parameter InitialValue FinalValue +/- Error GblCorr.

                n0    1.0000e+03    1.0328e+03 +/-  4.03e+01  <none>
                n1    5.0000e+02    4.9529e+02 +/-  2.31e+01  <none>
                n2    1.0000e+03    2.8242e+02 +/-  1.35e+01  <none>

num entries in dataset: 1820
fit results:
n0: 1032.80
n1: 495.29
n2: 282.42
total num events from fit: 1810.51
difference between fit and num entries: -9.49[/code]

RooMinuit

[code] Floating Parameter InitialValue FinalValue +/- Error GblCorr.

                n0    1.0000e+03    7.8458e+02 +/-  3.02e+01  <none>
                n1    5.0000e+02    4.9252e+02 +/-  2.27e+01  <none>
                n2    1.0000e+03    5.4296e+02 +/-  2.65e+01  <none>

num entries in dataset: 1820
fit results:
n0: 784.58
n1: 492.52
n2: 542.96
total num events from fit: 1820.07
difference between fit and num entries: 0.07[/code]

So there’s a bit of a difference in the total number of events, but the big difference is the number of events in the exponential (n2), a difference of 2x. RooMinuit seems to return a better fit, judging by the overlaid PDF of the plots, but it also seems to have a problem with normalization on the time projection of the data as it is consistently below the data points.

Subranges with a very small (negligible) disconnected region in one variable, but not the other.

[code]fitTo():
Floating Parameter InitialValue FinalValue +/- Error GblCorr.

                n0    1.0000e+03    1.0778e+03 +/-  3.55e+01  <none>
                n1    5.0000e+02    5.0551e+02 +/-  2.32e+01  <none>
                n2    1.0000e+03    4.5671e+02 +/-  1.69e+01  <none>

num entries in dataset: 2496
fit results:
n0: 1077.81
n1: 505.51
n2: 456.71
total num events from fit: 2040.04
difference between fit and num entries: -455.96[/code]

RooMinuit:

[code] Floating Parameter InitialValue FinalValue +/- Error GblCorr.

                n0    1.0000e+03    1.0189e+03 +/-  3.58e+01  <none>
                n1    5.0000e+02    4.8591e+02 +/-  2.30e+01  <none>
                n2    1.0000e+03    9.9093e+02 +/-  3.63e+01  <none>

num entries in dataset: 2496
fit results:
n0: 1018.93
n1: 485.91
n2: 990.93
total num events from fit: 2495.77
difference between fit and num entries: -0.23[/code]

Now there's a *big* difference in the fit results, but RooMinuit seems to get it right, judging by the overlaid PDF. Though for RooMinuit, the plotting looks good when I use Range(FULL)/NormRange(FULL), but it looks off when I use Range(subrange)/NormRange(FULL), even though there should be negligible difference between the subrange and FULL range.

So to summarize:

fitTo() and RooMinuit are giving the same result when I fit to a full range of data, but different results when I use disconnected subranges. RooMinuit looks more ``correct".
I’m still not sure the proper way to plot the results.

Hypotheses:
I’ve not set these up properly and I’m not actually comparing things consistently between fitTo() and RooMinuit.
Bug?

Thanks in advance for any suggestions anyone can offer.

Matt
test_two_gaussians_and_exponentials_1_sub_ranges.C (8.29 KB)