SumW2Error with Extended ML

hjkim · April 4, 2014, 9:38pm

Hello,

I wanted to fit a weighted dataset and obtain the normalization factor and its error with SumW2Erorr(kTRUE).
But I found somewhat strange behavior of the option. After some test with a simple gaussian, I am quite confused about this option.

Here’s the code used for the tests.
test was done on the ROOT v5.34.18

using namespace RooFit;

void gaussianSumW2() {

  RooRealVar x("x","x",-10,10);
  RooRealVar mu("mu","mu",0,-1,1);
  RooRealVar sig("sig","sig",1,0,10);

  RooGaussian gauss("gauss","gauss",x,mu,sig);

  RooRealVar w("w","w",0.25); // Weight                                                                                                                       

  RooDataSet* data = gauss.generate(x,10000);
  RooDataSet* dataW = new RooDataSet(data->GetName(),data->GetTitle(),data,RooArgList(x,w),0,"w");

  RooRealVar N("N","N",0,100000);
  RooExtendPdf extend("extend","extend",gauss,N);

  // Fit the non-weighed dataset                                                                                                                              
  res1 = extend.fitTo(*data, Save());

  // Fit the weighed dataset                                                                                                                                  
  res2 = extend.fitTo(*dataW, SumW2Error(kTRUE), Save());

}

A gaussian fitted to weight=1 events and weight=0.25 for the whole events.
With the generation of 10000 events, one can expect

N = 10000 +/- 100 for weight = 1 ( SumW2Error(kFALSE) )
N = 2500 +/- 25 for weight = 0.25 ( SumW2Error(kTRUE) )

Since the relative uncertainty should not change.
However, Things didn’t go as I expected.

Here’s what I’ve got.
First, weight =1 , SumW2Error(kFALSE) case

 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-67940.6 FROM HESSE     STATUS=OK             16 CALLS          85 TOTAL
                     EDM=8.92966e-08    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  N            1.00000e+04   9.99999e+01   2.39979e-05  -9.27295e-01
   2  mu           6.64894e-03   9.97333e-03   3.59087e-04   6.64899e-03
   3  sig          9.97349e-01   7.05231e-03   1.69492e-05  -9.28179e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  1.000e+04  5.592e-08 -1.979e-07 
  5.592e-08  9.947e-05  1.814e-08 
 -1.979e-07  1.814e-08  4.974e-05 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00000   1.000  0.000 -0.000
        2  0.00026   0.000  1.000  0.000
        3  0.00026  -0.000  0.000  1.000

Second, weight = 0.25, SumW2Error(kTRUE) case.

 **********
 **   18 **HESSE        1500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-13519.4 FROM HESSE     STATUS=OK             16 CALLS          66 TOTAL
                     EDM=2.79806e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  N            2.49992e+03   4.99987e+01   5.07817e-05  -1.25324e+00
   2  mu           6.64681e-03   1.99457e-02   6.40648e-05   6.64686e-03
   3  sig          9.97351e-01   1.41046e-02   1.51177e-05  -9.28179e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  2.500e+03  0.000e+00  0.000e+00 
  0.000e+00  3.979e-04  1.280e-08 
  0.000e+00  1.280e-08  1.989e-04 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00000   1.000  0.000  0.000
        2  0.00005   0.000  1.000  0.000
        3  0.00005   0.000  0.000  1.000
[#1] INFO:Fitting -- RooAbsPdf::fitTo(extend) Calculating sum-of-weights-squared correction matrix for covariance matrix
 **********
 **   23 **HESSE        1500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-81218.2 FROM HESSE     STATUS=OK             26 CALLS          92 TOTAL
                     EDM=1.11879e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  N            2.49992e+03   2.49993e+01   1.03595e-03  -1.25324e+00
   2  mu           6.64681e-03   3.98835e-02   1.30692e-03   6.64686e-03
   3  sig          9.97351e-01   2.82089e-02   3.08401e-04  -9.28179e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  6.250e+02 -6.339e-13 -4.842e-10 
 -6.339e-13  1.592e-03  1.042e-06 
 -4.842e-10  1.042e-06  7.958e-04 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00000   1.000 -0.000 -0.000
        2  0.00093  -0.000  1.000  0.001
        3  0.00093  -0.000  0.001  1.000
setting parameter 0 error to 99.9973 
setting parameter 1 error to 0.00997482
setting parameter 2 error to 0.00705239 << -- these three errors are the same with those of weight=1 case!

Obviously, the last result from HESSE obtained the right amount of error for N. But Roofit force the value backward to the non-weighted errors. For weight > 1 cases, we have exactly the same behavior, for example, I have the error 100 as final when there is 2500 for the error in the last HESSE for weight=25.

I guess RooAbsPdf::fitTo include a sort of inverse transformation.

And I tested the same code with ROOT v5.34.06 (RooFit v3.56)
The weight=1 case shows the same result with the v5.34.18. But the other shows quite different result.
Weight = 0.25 case // ROOT v5.34.06

 **********
 **   18 **HESSE        1500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-13519.4 FROM HESSE     STATUS=OK             16 CALLS          66 TOTAL
                     EDM=2.79805e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  N            2.49992e+03   4.99986e+01   5.07817e-05  -1.25324e+00
   2  mu           6.64681e-03   1.99457e-02   6.40648e-05   6.64686e-03
   3  sig          9.97351e-01   1.41046e-02   1.51177e-05  -9.28179e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  2.500e+03 -1.690e-12 -2.519e-08 
 -1.690e-12  3.979e-04  1.335e-08 
 -2.519e-08  1.335e-08  1.989e-04 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00000   1.000 -0.000 -0.000
        2  0.00005  -0.000  1.000  0.000
        3  0.00005  -0.000  0.000  1.000
[#1] INFO:Fitting -- RooAbsPdf::fitTo(extend) Calculating sum-of-weights-squared correction matrix for covariance matrix
 **********
 **   23 **HESSE        1500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-1504.91 FROM HESSE     STATUS=OK             24 CALLS          90 TOTAL
                     EDM=2285.01    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  N            2.49992e+03   6.37362e+01   1.03595e-03  -1.25324e+00
   2  mu           6.64681e-03   3.98835e-02   1.30692e-03   6.64686e-03
   3  sig          9.97351e-01   2.82089e-02   3.08401e-04  -9.28179e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  4.062e+03  6.961e-11  4.922e-11 
  6.961e-11  1.592e-03  1.042e-06 
  4.922e-11  1.042e-06  7.958e-04 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00000   1.000  0.000  0.000
        2  0.00093   0.000  1.000  0.001
        3  0.00093   0.000  0.001  1.000
setting parameter 0 error to 39.222 -->> Overestimated
setting parameter 1 error to 0.00997486
setting parameter 2 error to 0.00705239

Here we have somewhat strange values in the last HESSE result as well as the final value forced by Roofit.
I’m not quite sure about the difference of each versions at this point. Was there any change even for HESSE??

Please let me know if there’s anything I did wrong or misunderstood.

Thanks

-Hanjin

Added >>

I check the covariance matrix for each steps in hesse.
Weight^1 and weight^2 dataset were separately generated, and fitted. Then I extracted the covariance matrixes. Codes are shown below.

  RooAbsReal* nll0 = extend.createNLL(*data,Extended(kTRUE));
  RooAbsReal* nll = extend.createNLL(*dataW,Extended(kTRUE));
  RooAbsReal* nll2 = extend.createNLL(*dataW2,Extended(kTRUE));

  RooMinimizer m0(*nll0);
  m0.minimize("Minuit","minuit");
  m0.hesse();
  RooFitResult* rw0 = m0.save();
  TMatrixDSym& mat0 = rw0->covarianceMatrix();

  RooMinimizer m(*nll);
  m.minimize("Minuit","minuit");
  m.hesse();
  RooFitResult* rw = m.save();
  const TMatrixDSym& matV = rw->covarianceMatrix(); // WEIGHT                                                                                                \
                                                                                                                                                              
  RooMinimizer m2(*nll2);
  m2.minimize("Minuit","minuit");
  m2.hesse();
  RooFitResult* rw2 = m2.save();
  TMatrixDSym& matC = rw2->covarianceMatrix(); // WEIGHT SQUARED

With weight = 0.25, No of event = 10000

Par 0 : N, Par 1 : mu , Par 2 : Sig

1. From Hesse before w^2 applied. V
     |      0    |      1    |      2    |
--------------------------------------------
   0 |       2500  -3.846e-08   3.518e-11 
   1 | -3.846e-08   0.0004925  -4.505e-07 
   2 |  3.518e-11  -4.505e-07   0.0004532 

2. From Hesse after w^2 applied. C
     |      0    |      1    |      2    |
--------------------------------------------
   0 |      624.9  -4.967e-08    4.61e-11 
   1 | -4.967e-08     0.00197  -1.828e-06 
   2 |   4.61e-11  -1.828e-06    0.001813 

3. C is inverted. C-1
     |      0    |      1    |      2    |
--------------------------------------------
   0 |     0.0016   4.035e-08  -1.392e-26 
   1 |  4.035e-08       507.6      0.5119 
   2 | -1.292e-26      0.5119       551.6 

4. Calculate C-1V first 
     |      0    |      1    |      2    |
--------------------------------------------
   0 |      4.001  -4.168e-11   3.813e-14 
   1 |  8.134e-05        0.25    3.36e-06 
   2 | -2.851e-10   3.651e-06        0.25 

5. Final result VC-1V 
      |      0    |      1    |      2    |
--------------------------------------------
   0 |      1e+04  -1.138e-07    1.04e-10 
   1 | -1.138e-07   0.0001231   -1.11e-07 
   2 |   1.04e-10   -1.11e-07   0.0001133 

6. Weight = 1 covariance matrix 
     |      0    |      1    |      2    |
--------------------------------------------
   0 |      1e+04           0           0 
   1 |          0   0.0001231  -1.109e-07 
   2 |          0  -1.109e-07   0.0001133

The result was the same for v5.34.18 and v5.34.06 even though in the case used fitTo wasn’t.

Step 5 and 6 shows (approximately) the same matrix.
Is there fundamental difference between normalization N and the other parameters in calculating error?
Hesse seems to keep N as a normalization and give it poisson error according to its central value. I am not sure if it makes sense to transform in the form of VC-1V because I’m a newbie in the statistics…

moneta · April 8, 2014, 7:20am

The error treatment in weighted extended likelihood fit has been fixed last week.
See RooFit: SumW2Error.

Please update to the HEAD of the 5.34 patches or wait for the new release 5.34.19

Best Regards

Lorenzo

hjkim · April 8, 2014, 11:59pm

Great! I just checkouted the patch and found the values are fine!

Thanks a lot!

Hanjin