Range fit ROOT vs RooFit

ixiotidi · December 12, 2019, 2:43pm

Hi,

I am trying to implement an unbinned extended likelihood fit in ROOT with the use of the ROOT::Fit::Fitter. I am using a TF1 model which I am creating from a C++ function. I am normalizing my TF1 in the range of the fit and I am creating an Unbinned data set. In order to check my procedure I am performing the same fit with the same data in RooFit. My model is a chebychev + exponential PDF and I am parametrizating them in Root and RooFit the same way. Initially when I perform the fit as the two fitters are I am getting slightly different results, therefore I decided to configure my ROOT fitter in the same way as the RooFit fitter by setting the same error values, limits on the parameters, maximum tolerance and so on. Running the fit in the full range produces the same result (~10^4 difference in the parameters). However when I am blinding my models (2 ranges fit) I get a discrepancy in all the parameters of my models. The output is not very different since when plotting the two model on the top of each other the Y values at some specific X values are 10^-2-10^3 different. In addition the two fitter converge on a different ML point. However I a don’t understand where this difference is coming from and whether there is a way to have ROOT and RooFit produce exactly the same result. Can anybody explain to me what exactly RooFit does for fitting two ranges and if it’s possible to be done in ROOT as well ?

Cheers
–John

StephanH · December 12, 2019, 4:48pm

Hello John,

with multiple ranges, RooFit creates one negative log-likelihood for each range, and adds them. That’s handed over to the fitter for minimisation.
Each part is integrated and normalised separately. I don’t know what ROOT’s fitter is doing.

ixiotidi · December 12, 2019, 10:56pm

Hi Stephan,

thank you for your reply and the information! Would it be possible to reproduce this behaviour of RooFit in ROOT or it’s impossible ? Also, regarding what the ROOT fitter is doing do I have to post the question somewhere else in the Forum or it’s possible to get an answer here ?

Cheers
–John

moneta · December 13, 2019, 7:47am

Hi,

Using the ROOT::Fit::Fitter class you can use multiple range by filling the input data class ROOT::Fit::UnbinData with the points within the range.
You need also to normalise correctly the TF1 within the provide range Unfortunately this is not done automatically, because the TF1 class supports only a single range, so you would need to define yourself the noralized function.
Afterwards you should get the same result, within some numerical error. If not, please post your example. I will be interested to look into this

Best regards

Lorenzo

ixiotidi · December 17, 2019, 12:40am

Hi Lorentzo,

I am attaching the two macros (RooFit and ROOT) for fitting an unbinned dataset on a toy sample generated from the RooFit model. Both, fits converge properly, however the values of the shape parameter of the Exponential is not exactly the same. In addition the number of events in the extended term seems to be off a lot. Could you please let me know if this is the method you had in mind. Thanks a lot again. Please let me know if this is the expected behavior.

Cheers
–John
RooFitFit.C (1.8 KB) RootFitFit.C (8.1 KB)

moneta · December 17, 2019, 10:51am

Hi,

Thank you for your examples. I will look at them and let you know. I am almost sure it is an issue on how the pdf is normalized in the multiple ranges and therefore what is the definition of the number of events

Lorenzo

ixiotidi · December 17, 2019, 11:03am

Hi Lorentzo,

thanks a lot. Let me know when you have something. I am normalizing my PDF like manually by calculating the Normalization using the analytical expression. So I integrated in Mathematica the Exponential in the given range and then I am implementing the result by hand as you can see. In any case let me know thanks again

Cheers
–John

moneta · December 17, 2019, 11:30am

Hi,
I am missing the root file to run your example, ToyExponentialTree.root

I attached also a macro from the RooFit author showing the different possibilities one has to do multiple-range fits in RooFit. This maybe helps clarifying the issue

Lorenzo

ExampleRangeFit.C (7.6 KB)

ixiotidi · December 17, 2019, 11:38am

Hi Lorenzo,

sorry for that I am generating it from the ToyExponential.root using a macro I wrote, in any case the
file is this one. I am also attaching the macro that generates it in case you want to generate another file for a different set
DataSet2TreeConv.C (979 Bytes) ToyExponentialTree.root (71.6 KB)
Thank you for the macro I will take a look now to see how it’s been done.

Cheers
–John

moneta · December 17, 2019, 2:16pm

Hi,

Thank you for the files. So I see in the ROOT fit you normalize the pdf in the two ranges. To have this behavior in RooFit you need to pass the range name as optional parameter in the RooExtendPdf.
By doing this, you will get exactly the same result:

fitRange = "lowerSB,upperSB";
RooExtendPdf *ExtExpo = new RooExtendPdf("ExtExpo","ExtExpo",*Exponential,*nExpo,fitRange);

Lorenzo

ixiotidi · December 17, 2019, 4:17pm

Hi Lorenzo,

so the problem was the RooFit code not producing the exact result and not the ROOT code. I have another question in my more complicated model (Chebychev + Exponential) I am not using the RooExtendPdf method but rather I am using the RooAddPdf method where I provide nCheb and nExpo to my model. Should I use there as well the fitRange as an optional parameter ? Could you elaborate a bit more on how ROOT handles the extended term in case of two model? I am normalizing both models in the two ranges and then I am writing: f(x) = Ntot*((Ncheb/Ntot)*Chebychev + (1-NCheb/Ntot)*Exponential), is that an equivalent parametrization of RooFit ? Thank you very very much for your immediate feedback !

Cheers
–John

moneta · December 17, 2019, 9:57pm

Hi,

In case of a RooAddPdf you can fix the definition of the coefficient using the function RooAddPdf::fixCoefRange. I think if you use the default case of the RooAddPdf (that is fixCoefRange to the full range), it does what you expect.
If not, please let me know

Lorenzo

ixiotidi · December 17, 2019, 11:14pm

Hi Lorenzo,

I am terribly sorry for all the messages and all the questions but I tried to run the fit with the combined model but I am getting a different value at the result, and here the result is significantly different. I am attaching the files if you have time to take a look. Thanks again in advance.

–John

BkgCombModel.root (73.0 KB) BkgCombModelTree.root (70.8 KB) BkgFit.C (1.9 KB) BkgModelRoot.C (8.3 KB)

moneta · December 18, 2019, 11:45am

Hi,
Thank you for the new macros. I will look into that, but it will require a bit of time

Lorenzo

ixiotidi · December 18, 2019, 12:03pm

Hi Lorenzo,

yes no worries, thank you for looking at those.

–John

moneta · December 19, 2019, 5:54pm

Hi,

The correct solution to do in RooFit is to use the RooAddPdf but nt passing extended ones, i.e. you should do:

RooAddPdf *CombModel = new RooAddPdf("CombModel","CombModel",RooArgList(*Cheb,*Expo),RooArgList(nCheb,nExpo));

This will fix the normalisation coefficient definition on the full range, i.e. this means that the returned value of Nchep and Nexpo are defined as expected number of events if the given functions are defined in the full range.
I think the results should be the same as in ROOT, it is very similar for the shape parameters but not for the coefficients, once we re-scale them. Their ratio is not the same and I am not sure if this effect is numerical, since the two coefficients are highly correlated or it is due to something else. One should try with a fit with less correlation between the two. I will try to understand this better

Lorenzo

ixiotidi · December 19, 2019, 11:12pm

Hi Lorenzo,

thank you for the reply and all the help! I run the fit as you suggested with the RooAddPdf method however the result looks still different also for the shape parameters of the two models:
RooFit Result:
1 aCheb 4.38648e-02 2.78146e-01 2.96547e-05 4.38650e-03
2 aExpo -2.45271e-03 1.29808e-03 3.42047e-08 -2.45271e-04
3 nCheb 6.54120e+03 2.31296e+03 1.71822e-04 7.13018e-01
4 nExpo 2.54479e+03 2.30520e+03 1.08203e-04 2.57309e-01
Root Result:
1 aCheb 1.31494e-01 3.97287e-01 3.36477e-05 1.31498e-02
2 aExpo -2.14973e-03 1.12848e-03 1.44335e-07 -2.14973e-04
3 nCheb 4.08193e+03 2.06629e+03 9.77409e-05 4.20473e-01
4 nExpo 2.44904e+03 2.06857e+03 8.56630e-05 2.47421e-01
what do you mean by very similar? In compare to the example with the simple Exponential model only I saw that we retrieved the exact same result here the situation looks a bit different for the blinded case, whereas for the non-blinded the result is indeed very similar.
Regarding the ratio shouldn’t be the quantity to be compared the ration between the nChebNormCheb/nExpoNormExpo for a model function normalized in the full range and a model function normalized in the blinded range ? If I create a full range model with the fit result of RooFit in Root and a blinded model with the shape parameters from RooFit and require the quantity I wrote above to be equal then I can retrieve nCheb and nExpo numbers that make the two models to be the same.
Thank you again so much for your help, please let me know if you have anything new.

Cheers
–John

moneta · December 20, 2019, 12:09pm

Hi,

Strange you are getting different results. I get this in ROOT

****************************************
Minimizer is Minuit2 / Migrad
MinFCN                    =     -7044.65
NDf                       =         6532
Edm                       =  0.000131789
NCalls                    =          246
aCheb                     =    0.0406287   +/-   0.262688     	 (limited)
aExpo                     =  -0.00246205   +/-   0.00124124   	 (limited)
nCheb                     =      4599.01   +/-   1531.8       	 (limited)
nExpo                     =      1937.64   +/-   1532.42      	 (limited)

and this in RooFit

 FCN=-7044.65 FROM HESSE     STATUS=OK             23 CALLS         254 TOTAL
                     EDM=4.93389e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  aCheb        4.38650e-02   2.78103e-01   2.96546e-05   4.38652e-03
   2  aExpo       -2.45271e-03   1.29787e-03   3.42047e-08  -2.45271e-04
   3  nCheb        6.54120e+03   2.31259e+03   1.71822e-04   7.13018e-01
   4  nExpo        2.54480e+03   2.30483e+03   1.08203e-04   2.57310e-01

I noticed however that the errors are quite unstable. Running Hesse after fitting one get a rather different errors. This is due certainly to the large existing correlations.

Which ROOT version are you using ?

Lorenzo

ixiotidi · December 20, 2019, 12:55pm

Hi Lorenzo,

I am running ROOT v6.14/04. That is indeed a very strange result I am running the macro I provided with #define Blinding compiler argument no further changes. I just checked also the lxplus has the same root
version that’s why I used this one. Just to be sure this is the Root result is this one:

Minimizer is Minuit / Migrad
MinFCN = -7039.49
NDf = 6532
Edm = 0.000233078
NCalls = 277
aCheb = 0.131494 +/- 0.397287 (limited)
aExpo = -0.00214973 +/- 0.00112848 (limited)
nCheb = 4081.93 +/- 2066.29 (limited)
nExpo = 2449.04 +/- 2068.57 (limited)

for RooFit I am getting exactly the same result as yours. Strange, thank very much for your help.

–John

moneta · December 20, 2019, 3:54pm

Hi,

This is very strange. I have tried with 6.14 and obtain the same result. Could you please attach the full log of the fitting ROOT obtained using PrintLevel=3 ?

Cheers

Lorenzo