Different chi2 result in RooFit?

emanuele_cardinali · November 24, 2020, 10:53am

Hi everybody,
I have a problem with the chi2 test for the fit of one of my functions.
I have read in previous topics that there is a lot of ambiguity regarding the RooFit test required.
I then requested both “chi2/ndof” and “chi2” to evaluate if the two matched.
I calculate the degrees of freedom as: “# of bins in the histogram - # of parameters in the pdf”.

  RooChi2Var chi_2("chi_2", "chi_2", model,dh,DataError(RooAbsData::Poisson));
  cout << chi_2.getVal() << endl ;
 
  Double_t chi2 = frame->chiSquare(7);
  cout<<chi2<<"\n";

I decided to verify that the result did not depend deeply on the number of bins, varying them in a “for” loop and printing the results in a file.txt.
This is what appears in the files new example.txt (416 Bytes)

While the chi2 (in the first column) divided by the ndof (in the third column) give acceptable results, the chi2 / ndof (in the second column) varies a lot and, it would seem random.
What could this be due to?

emanuele_cardinali · November 24, 2020, 11:05am

I have read on the subject that the chi2 test on RooFit also takes into account empty bins. Is it possible that the error is due to this?
Maybe the “chi2 / ndof” has this type of error while the other function is similar to the ROOT one.

system · December 8, 2020, 11:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

eguiraud · December 8, 2020, 11:56am

Hi @emanuele_cardinali,
sorry for the late reply! We need @moneta’s help on this one.

Cheers,
Enrico

moneta · December 8, 2020, 1:55pm

Hi,

You have probably empty bins or bins with low statistics. Using DataError(RooAbsData::Poisson) does not make much sense to me because these compute errors using the 68% classic Poisson interval. These intervals over-cover the true interval significantly and therefore will make your chi2 certainly not following a chi-square distribution.
I think the best test statistics for checking histogram model comparison is using the Baker-Cousins chi-square. It is implemented in ROOT, when calling TH1::Chisquare with option L, see

With this method you will get correct handling of Poisson uncertainties

Best regards

Lorenzo

emanuele_cardinali · December 8, 2020, 3:00pm

The class of the function that I fit is: “RooFFTConvPdf” …
This construct I guess is for ROOT only. How can I get a trusted test with RooFit?

moneta · December 8, 2020, 3:50pm

You can always convert a RooFit pdf in a TF1 using the function RooAbsReal::asTF() and a RooFit data in an histogram with
RooDataHist::createHistogram() and use the function above.

Lorenzo

emanuele_cardinali · December 8, 2020, 8:50pm

Sorry @moneta I tried to do what is written:

#include <TH1.h>

TF1 *hello = signal.asTF(RooArgList(x));
Double_t chi2= Chisquare(hello,"L");

but I find the following error sign:

error: use of undeclared identifier 'Chisquare'

although the command should belong to the “TH1.h” library.
another thing that I did not understand is if among the RooArgList () I have to insert all the parameters of the function.

moneta · December 9, 2020, 7:48am

You need also to convert the RooDataSet to a TH1:

TH1 * h1 = data.createHistogram("x");
Double_t chi2= h1->Chisquare(hello,"L");

When creating the TF1 you have to pass first observables then parameters. If you don’t need to make a TF1 depending on parameters, you don’t need to pass them and the one stored in the RooAbsPdf will be used. Otherwise if you pass them, the TF1 will have its on parameters copied from the values stored in the RooAbsPdf. As third parameter list you can pass the normalisation parameters. In that case the returned TF1 will be a normalised.

Lorenzo

emanuele_cardinali · December 9, 2020, 10:43am

Thanks @moneta
In my case the function is given by the convolution of two known functions:
in total I have 4 fit parameters. No normalization parameters.
Could such a solution work?

RooArgSet *model_params = signal.getParameters(dh);

TF1 *model = signal.asTF(RooArgList(x), *model_params);
Double_t chi2= hma1->Chisquare(model,"L");

Because what I get is a chi2 of 6.3x10 ^ 6 (although the function fits well).

moneta · December 9, 2020, 11:35am

Yes, it is more tricky because you need to normalise the fitted function to the data, by scaling to the number of entries * bin width.

I think it is easier if I post here an example.
exampleChisquare.C (1 KB)

Lorenzo

emanuele_cardinali · December 9, 2020, 6:51pm

I think I was able to calculate it correctly and thank @moneta you for the help.
One thing that I do not explain, however, is that, with the same ROOT program, the probability associated with the chi square calculated by TMath’s function is 70% while that with the Baker-Cousins method is less than 10^(- 6)%
Is it normal that it is so “restrictive”?

eguiraud · December 23, 2020, 6:59pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.