fitTo gives inconsistent results with disconnected Ranges

mattbellis · September 29, 2011, 6:09pm

Hi all,

I’ve been having concerns about the proper way to use fitTo when fitting to disconnected ranges in a 2D PDF. I’m also unsure of the proper way to plot the results. I’ve put together some sample code (attached) which I hope will illuminate the issues. I’m using ROOT 5.31/01 on a Debian 64-bit machine.

I’m also concerned about disagreement between fitTo and using RooMinuit. This sample macro will also address those issues, but I’ll post that in a different thread.

I create a 2D PDF, let’s say in energy and time. There are three components to the PDF.

A Gaussian in energy, exponential decay in time.
A Gaussian in energy, exponential decay in time.
An exponential in energy, no time dependence.

The ranges for the variables are:

energy: 0-12 (arbitrary units)
time: 1-500 (arbitrary units)

I’m going to try three different fits to three different ranges.

The full range of the dataset.
Two disconnected subranges: full range in energy and then from (1-200) and (400-500) in time. So for this subrange, I’ve cut out a bit swath in time.
Two disconnected subranges: full range in energy and then from (1-400) and (401-500) in time. So this subrange should be almost exactly the same as the full range. I’ve only cut out 0.2% of the data.

Whenever I fit, I create (I think) a reduced dataset consisting of either the full range or the two disconnected subranges. I used this reduced dataset in the fit.

When I plot the PDF after the fit over the data, I was experimenting with the plotting options, Range and NormRange. So for each fit, I overlay the PDF on the energy and time projections of the data with 4 different combinations of these options. I know that some of these should definitely not work when I overlay on top of the reduced datasets.
No additional range options.
Range: FULL, NormRange: FULL.
Range: subranges, NormRange: FULL.
Range: subranges, NormRange: subranges.

I use these options for both the energy and time plots. When I run these fits, I set the seed for the random generation, so successive tests should yield the same results. You can call the different fits to the three subranges by loading in this macro and then passing in 0, 1 or 2 as the first option.

I’m fixing all components to what was used in the data generation except for the numbers of events in each of the three components. I do an extended fit.

root -l
] .L test_two_gaussians_and_exponentials_1_sub_ranges.C
] test_two_gaussians_and_exponentials_1_sub_ranges(0)
] test_two_gaussians_and_exponentials_1_sub_ranges(1)
] test_two_gaussians_and_exponentials_1_sub_ranges(2)

A summary of these results follow:

Full range

(a) Fit: When I fit to the full range, things look mostly good! The fit converges to pretty much the generated values.

(b) Plots: The plots don’t look good when I don’t specify Range/NormRange. But when I use these options (they’re all the same for the other three plotting options), the PDF looks good overlaid on the data. Huzzah

Subranges with large disconnected region in one variable, but not the other.

(a) Fit: The fit converges such that the total number of events from the fit is very close to the number of events in the reduced dataset (0.5%). The qualitative fit overlaid on the data doesn’t look great, but that could be because we’re missing a lot of the data.
(b) Plots: There is not normalization good agreement between the PDF and data when I don’t use any Range options, nor when I tell it to use the subranges for both Range and NormRange. At first glance the agreement looks reasonable when I use Range(subranges) and NormRange(FULL or subranges), but looking closer at the time distribution, it looks like the PDF is consistently below the data in the time distribution. More so than the disagreement between the number of entries between the dataset and fit results would suggest. So I’m concerned.

Subranges with a very small (negligible) disconnected region in one variable, but not the other.

(a) Fit: The fit now returns a 20% difference between the number of entries returned by the fit and the amount in the dataset. Even though this fit should be almost the same as the first fit over the full range.
(b) Plots: There is now a noticeable difference between the normalization of the PDF when we use NormRange(FULL) and Range either subranges or FULL. Again, these should be a negligible difference.

So to summarize:

I don’t seem to be able to consistently fit subranges with the fitTo function in RooAbsPdf.
I don’t know the proper way to plot the PDF over the data with these subranges.

Hypotheses:
I’ve just done something wrong. I’ve tried to stay consistent with the examples and tutorials, but it’s possible that I’ve made a mistake.
Two of my components have an explicit time dependence, while the other does not. Does this matter? Do I have to introduce a constant PDF in time for the third component?
When I construct the RooAddPdf, I use 3 RooRealVars for the relative number of events in each component of the PDF. Do I need to use the constructor that uses the fractional amounts?

Anyways, any help would be very welcome. Going to post now about the difference between RooMinuit and fitTo.

Matt
test_two_gaussians_and_exponentials_1_sub_ranges.C (8.29 KB)