Fit on a random generated dataset

Lorenzo_Sostero · July 22, 2020, 10:14am

Hello,

I am trying to fit an histogram I have previously random generated using a TF1 function (data->Fill(function->GetRandom()). To run the fit, I use as model the same type of function.

For example, suppose I generated an histogram (“data” in the following code) in the range [1.50;10] using a Gaussian distribution defined by \mu=2 and \sigma=1 (in the range [1.50;10]). Then the model for the fit would be a TF1 Gaussian with parameter [0] = \mu and [1] = \sigma to fit in the range [1.50;10].

Soon after the definition of the “model” (a TF1 function with parameters), I set the parameter limits. For example, for the parameter [0] that I want to keep in the range [1;3]:

model -> SetParLimits(0, 1, 3);

Then, I set an initial value for the parameter ([0]=1.5):

model -> SetParameter(0, 1.5);

and finally, I can proceed in the fit using:

data -> Fit ("model", "R");

where “R” is added in order to use the range specified in the function.

Is this the right way to proceed? How can I improve performance of the fit still using Migrad?

Thanks in advance.

yus · July 22, 2020, 12:24pm

Hi Lorenzo,

why do you do both

model -> SetParLimits(0, 1, 3);
and
model -> SetParameter(0, 1.5);

? I don’t think this makes sense, you should try only of these.

Lorenzo_Sostero · July 22, 2020, 12:39pm

Thanks for answering.

I use (for example):

model -> SetParLimits(0,1,3)

to set the domain of research where the algorithm can search solutions, while I use:

model -> SetParameter(0,1.5)

to define the starting value of the parameter [0].

Why do you think this has no sense?

yus · July 22, 2020, 1:16pm

I’m a bit confused actually. Your example is probably too “synthetic” for me

If your initial histogram has mu=2, why do you set
model -> SetParameter(0,1.5)
and not
model -> SetParameter(0,2.)
?
You do use the fact that mu=2 in the
model -> SetParLimits(0,1,3);
by limiting the search to mu +/- 1, right?

My point is that setting the initial value of mu with
model -> SetParameter(0, 1.5);
will not do much in your example. I just tried with and without (see my code below) and I don’t see much of a difference. Because we already know the best mu is at 2, so then what’s the point of doing this? The fitting procedure will test many values inside the limiting range (1-3) anyway…

        TF1 *function = new TF1("myFunction", "TMath::Gaus(x, 2, 1)", -10., 10.); // mu = 2, sigma = 1

        TH1F *data = new TH1F("data", ";X;Y", 200, -10., 10.);
        for (unsigned int i=0; i<1e6; ++i)
                data->Fill(function->GetRandom());
        data->Draw("PE");


        TF1 *model = new TF1("myModel", "[0]*TMath::Exp(-0.5*((x-[1])/[2])**2)", 1.5, 10.);
        model->SetParLimits(1, 1, 3);
        model->SetParLimits(2, 0.5, 1.5);
//      model->SetParameter(1, 1.5); // is this really neeeded?
        model->SetLineColor(kRed);
        data->Fit(model, "R");

Lorenzo_Sostero · July 22, 2020, 1:37pm

I’m sorry, I didn’t explain very well.

When I set:

model ->SetParameter(0, 1.5)

I’m trying to make a guess of a good value for the parameter from which the fit can start the research. In my example, I already know that \mu=2 is the solution, but I don’t want to provide to the algorithm the right value.

Setting the limits plays a similar role. If I know, from mathematical considerations or from particular information on the problem, that \mu can vary only in a predefined domain, I use that range as domain of research. The range [1;4] would have worked fine too.

yus · July 22, 2020, 1:41pm

Ah, okay. So in a way you want to know if the fit can find the “best” value given you already know that value, but at the same time you don’t want to give the fitting procedure too many clues on what that value might be.
In that case, I think everything is fine with the way you are doing things.

Lorenzo_Sostero · July 22, 2020, 1:52pm

Thanks.

Can you help me to understand how to improve the performance of the fit? On trickier fit I have found not so good performance and I was wondering if there were some other implementations that could help the fit. I have tried:

data -> Fit ("model", "R M")

but I would like to try also other methods.

yus · July 22, 2020, 1:56pm

Not off the top of my head. It’d be great if you could show the code reproducing your issue. A lot of things could go wrong with a fit: wrong function, bad ranges etc., so it’s difficult to give any advice without seeing the code.

Lorenzo_Sostero · July 22, 2020, 2:02pm

Other tests have been done exactly in the same way as described before. I only changed the generating function of the histogram and adapted the limits and initial values of the parameters.

yus · July 22, 2020, 2:04pm

You mean it’s not a gaussian anymore? Did you change the fitting function accordingly then?

moneta · July 22, 2020, 2:32pm

Hi,

Migrad , the default algorithm used in the fitting, is probably the best one, according to my experience. You can try using other minimiser, but they do not work better than Migrad.
You can also trying using Migrad from Minuit2 that in some cases might work slightly better, but in majority of the case is the same. Option M is not really useful.

The best , if you have issues with your fit is trying to input initial parameters as closed as possible to the solution.

Lorenzo

yus · July 22, 2020, 2:47pm

Another common mistake is to not take into account the errors (or take into account wrong errors) of the distribution people are trying to fit. I.e. if you have had complex manipulations with your histogram (e.g., if you have normalized your distribution before fitting it or maybe filled it with weights != 1), extra care should be taken when fitting.