Hello,
I wanted to fit a weighted dataset and obtain the normalization factor and its error with SumW2Erorr(kTRUE).
But I found somewhat strange behavior of the option. After some test with a simple gaussian, I am quite confused about this option.
Here’s the code used for the tests.
test was done on the ROOT v5.34.18
using namespace RooFit;
void gaussianSumW2() {
RooRealVar x("x","x",-10,10);
RooRealVar mu("mu","mu",0,-1,1);
RooRealVar sig("sig","sig",1,0,10);
RooGaussian gauss("gauss","gauss",x,mu,sig);
RooRealVar w("w","w",0.25); // Weight
RooDataSet* data = gauss.generate(x,10000);
RooDataSet* dataW = new RooDataSet(data->GetName(),data->GetTitle(),data,RooArgList(x,w),0,"w");
RooRealVar N("N","N",0,100000);
RooExtendPdf extend("extend","extend",gauss,N);
// Fit the non-weighed dataset
res1 = extend.fitTo(*data, Save());
// Fit the weighed dataset
res2 = extend.fitTo(*dataW, SumW2Error(kTRUE), Save());
}
A gaussian fitted to weight=1 events and weight=0.25 for the whole events.
With the generation of 10000 events, one can expect
N = 10000 +/- 100 for weight = 1 ( SumW2Error(kFALSE) )
N = 2500 +/- 25 for weight = 0.25 ( SumW2Error(kTRUE) )
Since the relative uncertainty should not change.
However, Things didn’t go as I expected.
Here’s what I’ve got.
First, weight =1 , SumW2Error(kFALSE) case
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-67940.6 FROM HESSE STATUS=OK 16 CALLS 85 TOTAL
EDM=8.92966e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 N 1.00000e+04 9.99999e+01 2.39979e-05 -9.27295e-01
2 mu 6.64894e-03 9.97333e-03 3.59087e-04 6.64899e-03
3 sig 9.97349e-01 7.05231e-03 1.69492e-05 -9.28179e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
1.000e+04 5.592e-08 -1.979e-07
5.592e-08 9.947e-05 1.814e-08
-1.979e-07 1.814e-08 4.974e-05
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00000 1.000 0.000 -0.000
2 0.00026 0.000 1.000 0.000
3 0.00026 -0.000 0.000 1.000
Second, weight = 0.25, SumW2Error(kTRUE) case.
**********
** 18 **HESSE 1500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-13519.4 FROM HESSE STATUS=OK 16 CALLS 66 TOTAL
EDM=2.79806e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 N 2.49992e+03 4.99987e+01 5.07817e-05 -1.25324e+00
2 mu 6.64681e-03 1.99457e-02 6.40648e-05 6.64686e-03
3 sig 9.97351e-01 1.41046e-02 1.51177e-05 -9.28179e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
2.500e+03 0.000e+00 0.000e+00
0.000e+00 3.979e-04 1.280e-08
0.000e+00 1.280e-08 1.989e-04
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00000 1.000 0.000 0.000
2 0.00005 0.000 1.000 0.000
3 0.00005 0.000 0.000 1.000
[#1] INFO:Fitting -- RooAbsPdf::fitTo(extend) Calculating sum-of-weights-squared correction matrix for covariance matrix
**********
** 23 **HESSE 1500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-81218.2 FROM HESSE STATUS=OK 26 CALLS 92 TOTAL
EDM=1.11879e-05 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 N 2.49992e+03 2.49993e+01 1.03595e-03 -1.25324e+00
2 mu 6.64681e-03 3.98835e-02 1.30692e-03 6.64686e-03
3 sig 9.97351e-01 2.82089e-02 3.08401e-04 -9.28179e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
6.250e+02 -6.339e-13 -4.842e-10
-6.339e-13 1.592e-03 1.042e-06
-4.842e-10 1.042e-06 7.958e-04
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00000 1.000 -0.000 -0.000
2 0.00093 -0.000 1.000 0.001
3 0.00093 -0.000 0.001 1.000
setting parameter 0 error to 99.9973
setting parameter 1 error to 0.00997482
setting parameter 2 error to 0.00705239 << -- these three errors are the same with those of weight=1 case!
Obviously, the last result from HESSE obtained the right amount of error for N. But Roofit force the value backward to the non-weighted errors. For weight > 1 cases, we have exactly the same behavior, for example, I have the error 100 as final when there is 2500 for the error in the last HESSE for weight=25.
I guess RooAbsPdf::fitTo include a sort of inverse transformation.
And I tested the same code with ROOT v5.34.06 (RooFit v3.56)
The weight=1 case shows the same result with the v5.34.18. But the other shows quite different result.
Weight = 0.25 case // ROOT v5.34.06
**********
** 18 **HESSE 1500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-13519.4 FROM HESSE STATUS=OK 16 CALLS 66 TOTAL
EDM=2.79805e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 N 2.49992e+03 4.99986e+01 5.07817e-05 -1.25324e+00
2 mu 6.64681e-03 1.99457e-02 6.40648e-05 6.64686e-03
3 sig 9.97351e-01 1.41046e-02 1.51177e-05 -9.28179e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
2.500e+03 -1.690e-12 -2.519e-08
-1.690e-12 3.979e-04 1.335e-08
-2.519e-08 1.335e-08 1.989e-04
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00000 1.000 -0.000 -0.000
2 0.00005 -0.000 1.000 0.000
3 0.00005 -0.000 0.000 1.000
[#1] INFO:Fitting -- RooAbsPdf::fitTo(extend) Calculating sum-of-weights-squared correction matrix for covariance matrix
**********
** 23 **HESSE 1500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-1504.91 FROM HESSE STATUS=OK 24 CALLS 90 TOTAL
EDM=2285.01 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 N 2.49992e+03 6.37362e+01 1.03595e-03 -1.25324e+00
2 mu 6.64681e-03 3.98835e-02 1.30692e-03 6.64686e-03
3 sig 9.97351e-01 2.82089e-02 3.08401e-04 -9.28179e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
4.062e+03 6.961e-11 4.922e-11
6.961e-11 1.592e-03 1.042e-06
4.922e-11 1.042e-06 7.958e-04
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00000 1.000 0.000 0.000
2 0.00093 0.000 1.000 0.001
3 0.00093 0.000 0.001 1.000
setting parameter 0 error to 39.222 -->> Overestimated
setting parameter 1 error to 0.00997486
setting parameter 2 error to 0.00705239
Here we have somewhat strange values in the last HESSE result as well as the final value forced by Roofit.
I’m not quite sure about the difference of each versions at this point. Was there any change even for HESSE??
Please let me know if there’s anything I did wrong or misunderstood.
Thanks
-Hanjin
Added >>
I check the covariance matrix for each steps in hesse.
Weight^1 and weight^2 dataset were separately generated, and fitted. Then I extracted the covariance matrixes. Codes are shown below.
RooAbsReal* nll0 = extend.createNLL(*data,Extended(kTRUE));
RooAbsReal* nll = extend.createNLL(*dataW,Extended(kTRUE));
RooAbsReal* nll2 = extend.createNLL(*dataW2,Extended(kTRUE));
RooMinimizer m0(*nll0);
m0.minimize("Minuit","minuit");
m0.hesse();
RooFitResult* rw0 = m0.save();
TMatrixDSym& mat0 = rw0->covarianceMatrix();
RooMinimizer m(*nll);
m.minimize("Minuit","minuit");
m.hesse();
RooFitResult* rw = m.save();
const TMatrixDSym& matV = rw->covarianceMatrix(); // WEIGHT \
RooMinimizer m2(*nll2);
m2.minimize("Minuit","minuit");
m2.hesse();
RooFitResult* rw2 = m2.save();
TMatrixDSym& matC = rw2->covarianceMatrix(); // WEIGHT SQUARED
With weight = 0.25, No of event = 10000
Par 0 : N, Par 1 : mu , Par 2 : Sig
1. From Hesse before w^2 applied. V
| 0 | 1 | 2 |
--------------------------------------------
0 | 2500 -3.846e-08 3.518e-11
1 | -3.846e-08 0.0004925 -4.505e-07
2 | 3.518e-11 -4.505e-07 0.0004532
2. From Hesse after w^2 applied. C
| 0 | 1 | 2 |
--------------------------------------------
0 | 624.9 -4.967e-08 4.61e-11
1 | -4.967e-08 0.00197 -1.828e-06
2 | 4.61e-11 -1.828e-06 0.001813
3. C is inverted. C-1
| 0 | 1 | 2 |
--------------------------------------------
0 | 0.0016 4.035e-08 -1.392e-26
1 | 4.035e-08 507.6 0.5119
2 | -1.292e-26 0.5119 551.6
4. Calculate C-1V first
| 0 | 1 | 2 |
--------------------------------------------
0 | 4.001 -4.168e-11 3.813e-14
1 | 8.134e-05 0.25 3.36e-06
2 | -2.851e-10 3.651e-06 0.25
5. Final result VC-1V
| 0 | 1 | 2 |
--------------------------------------------
0 | 1e+04 -1.138e-07 1.04e-10
1 | -1.138e-07 0.0001231 -1.11e-07
2 | 1.04e-10 -1.11e-07 0.0001133
6. Weight = 1 covariance matrix
| 0 | 1 | 2 |
--------------------------------------------
0 | 1e+04 0 0
1 | 0 0.0001231 -1.109e-07
2 | 0 -1.109e-07 0.0001133
The result was the same for v5.34.18 and v5.34.06 even though in the case used fitTo wasn’t.
Step 5 and 6 shows (approximately) the same matrix.
Is there fundamental difference between normalization N and the other parameters in calculating error?
Hesse seems to keep N as a normalization and give it poisson error according to its central value. I am not sure if it makes sense to transform in the form of VC-1V because I’m a newbie in the statistics…