RooFFTConvPdf not working in product with RooAddPdf even in new root versions

Kovy64 · September 2, 2022, 5:14pm

Dear all,

I’ve got a problem potentially related to this:

and this:
https://sft.its.cern.ch/jira/browse/ROOT-5380

In the attached example, you’ll find a model consisting of double gaussian (RooAddPdf) in variable “x” and gaussian-convolved exponential*erf (RooFFTConvPdf) in variable “y”. There is no cross-talk between x- and y- dependent components, moreover, parameters of the x-dependent double gaussian are fixed. I form a RooProdPdf from the x- and y- dependent components, and generate a toy sample in (x, y). Then I fit it with (1) only the y-dependent RooFFTConvPdf and get sensible result. Next, I restore the parameters and fit the same toy with (2) the product of RooAddPdf and RooFFTConvPdf. This leads to failure. In third step, I form RooProdPdf from only one gaussian (RooGaussian) and the RooFFTConvPdf, generate another toy and fit it with this product (3), which also seems to work fine.

I’m therefore wondering, if there could still be some hostility between RooAddPdf and RooFFTConvPdf when combined in RooProdPdf? I’ve tried this in various versions of root including newest 6.26, so any fixes introduced after the JIRA ticket linked above should be included. Or maybe I’m blaming wrong suspect, and the problem is somewhere else?

On a separate note: in the last fitting step, I try to remove RooFit::Range(“fullRange”) from the fitTo call in the 1-D fit performed in step (1), and the fit still converges but to very different result than (1), even though “fullRange” simply covers the entire range of variables x and y. Could you please explain why that happens? I would expect identical result in this case.
example_RooFFTConvPdf_product.cxx (3.1 KB)

Many thanks!

Ondra

etejedor · September 5, 2022, 3:27pm

Hello,

@moneta or @jonas can you help here? Thank you

jonas · September 6, 2022, 3:53pm

Hi! I see several problems with your script, maybe fixing them resolved your problems:

Your RooAddPdf is undefined, as you are using the constructor without coefficients. This constructor skips all non-extendible PDFs, which the Gaussians are, so your RooAddPdf has no summands and that’s why I’m not too surprised there are problems. You already have a coefficient variable defined in the previous line, why not use it to make the PDF valid again:
```
RooAddPdf g1_g2("g1_g2", "g1_g2", RooArgList(g1, g2), f);
```
I think you have an accidental double-negative in your exponential slope parameter. Your formula goes TMath::Exp(-@0/@3) with a minus sign, so the parameter has to be positive. However, when you reset the fit parameters, you set it to a negative value. Probably the fit has a hard time figuring out the sign is wrong, hence the bad convergence.
You should restrict the ranges of your parameters more. For example, your sigma parameters should only be defined in the >= 0.0 regime.
Your observable range for y is too restrictive. Your exp_erf_raw has mean 0.12 and sigma 1.2, so it is nonzero also below zero which is outside your observable definition range of 0 to 12. This is not good for the convolution, because it needs to integrate over the full PDF support range which doesn’t work if it extends beyond the observable range defined in RooFit. I would try to set the y range to -4 to 12. Probably this fixes also the problem you get when fixing the range as a fit parameter, because then the RooFFTConvPDF doesn’t have to sample the PDF outside the official definition range itself, which is probably going wrong in different ways when you have custom fit ranges.

Let me know if things look better after making these changes, if not feel free to re-attach the updated script again so we can investigate!

Cheers,
Jonas

Kovy64 · September 7, 2022, 1:57pm

Hi Jonas,

Thank you for your answer!

Regarding 1: You are absolutely right. I tried to make this example by stripping off unnecessary parts of much larger code, and made this mistake. I fixed it in what follows. Now the behaviour is actually closer to what I’ve observed in my larger code: the fit using RooProdPdf converges, but to rather significantly different values than when fitting the RooFFTConvPdf alone, although the likelihoods differ only by a multiplicative constant.

Regarding 2: True, but that only affected the first fit (of RooFFTConvPdf only), which converged properly in the end. The resets already set positive value. I fixed the initialization though, so in what follows, even the first fit will start from positive value.

Regarding 3: I cannot 100% agree here, for several reasons. Ideally, there would be no bounds to any parameters whatsoever, I therefore prefer to keep my variable ranges as wide as possible, including unphysical values. On one hand, that serves as indication that my likelihood and fit procedure is healthy and robust, on the other, I’m avoiding bias introduced by smaller range. Also, fit results with unphysical values can still be statistically relevant (but of course you’re right this is not the case with sigmas < 0). Also, this cannot explain why fits with essentially identical likelihoods give different results, so I didn’t change this in the attached sript, just so it doesn’t outweigh the effect I’m seeing by chance.

Regarding 4: This sounds like a very good point - I could build the convolution in larger range to have it constructed correctly and then only fit it in the desired 0-12 range (thanks for the tip!). But this cannot explain the problem I’m having (as described in point 1): the fit converging to different result when I build the likelihood with RooProdPdf instead of RooFFTConvPdf alone.

In summary: the fit result is different based on whether I fit the RooFFTConvPdf alone, or in product with completely frozen pdf.

Thanks again for looking into this!

Ondra

example_RooFFTConvPdf_product_2.cxx (3.2 KB)

jonas · September 8, 2022, 8:46am

Hi, thanks for your answer!

Regarding 3: I cannot 100% agree here, for several reasons. Ideally, there would be no bounds to any parameters whatsoever, I therefore prefer to keep my variable ranges as wide as possible, including unphysical values. On one hand, that serves as indication that my likelihood and fit procedure is healthy and robust, on the other, I’m avoiding bias introduced by smaller range.

It’s unreasonable to expect that the fit is robust for any parameter range, especially not the phase space where the PDF in your fit region is zero. Then also the gradient of the likelihood is zero and it should not be expected that the minimization works. So the rule of thumb is: the parameter ranges should be chosen at least as restrictive as necessary for the PDF to not become zero in the whole fit range.

The parameter range thing is however not the only issue that keeps your fit from converging. And it needs to converge before one can claim that there is a RooFit class that is not working, because in a non-convergent fit the results are arbitrary anyway (or, from a different angle, it’s not a good time investment to fix bugs that only affect non-convergent fits). The other problem in the model is that your parameters erf_mean and res_mean are completely correlated: both are shifting the PDF on the x-axis. The fit result of models with 100 % correlated parameters is not deterministic, so this needs to be fixed too. As the name “resolution” suggests that the Gaussian res is for smearing with a sigma that corresponds to the resolution, I would fix the mean to zero and absorb the former res_mean in the erf_mean parameter (i.e. adding its former value to the initial value of erf_mean).

Finally, there was another problem that I forgot to mention in my first post. When you reset the fit parameters to get the same initial state for the fits, you also have to reset the parameter errors to zero because the errors are used for the initial step size in the minimization.

In summary, when I do these 3 things with your script, the results result_1 and result_2 are identical:

Restrict parameters such that PDF can’t become zero in the fit region
Remove redundant res_mean parameter
Also reset parameter errors before fitting

Here is the updated version of the script:
example_RooFFTConvPdf_product_3.cxx (3.2 KB)

Let me know if this works for you or if you have further questions!

Cheers,
Jonas

Kovy64 · September 15, 2022, 7:52am

Hi Jonas,

thank you for your answer. You’re right about the correlated parameters.
However, when I try to run the updated version 3 of the script you’ve enclosed, I’m still getting very different behaviour in the two cases: the product fit does not converge (and the fit results are thus very different). The only thing I’ve added to your version was #include RooConstVar.h, so that I can run it compiled (however, both compiled and interpreted way gives the same behaviour). I enclose screenshot of the two RooFitResult printouts.

I’ve tried this in both my favourtie root 6.18 and the currently recommended 6.26, both on lxplus, and the behaviour I see is the same in both versions.

In what root version did you get identical behaviour, please?

Many thanks and sorry for the ongoing trouble,

Ondra

jonas · September 15, 2022, 9:46am

Hi!

That’s surprising that it doesn’t work with ROOT 6.26 for you. Here is the output I get with exactly the script I uploaded as example_RooFFTConvPdf_product_3.cxx:

  RooFitResult: minimized FCN value: 17516.9, estimated distance to minimum: 2.46967e-05
                covariance matrix quality: Full matrix, but forced positive-definite
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
              erf_mean    1.5750e+00    1.5838e+00 +/-  2.03e-01  <none>
             erf_sigma    1.2000e+00    1.1959e+00 +/-  2.53e-01  <none>
             exp_slope    1.5000e+00    1.4979e+00 +/-  2.77e-02  <none>
             res_sigma    2.3000e-01    3.1020e-01 +/-  4.87e-01  <none>


  RooFitResult: minimized FCN value: 33866.4, estimated distance to minimum: 2.82363e-05
                covariance matrix quality: Full matrix, but forced positive-definite
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    2.0000e-01
               g1_mean    5.0000e+00
              g1_sigma    1.5000e+00
               g2_mean    5.5000e+00
              g2_sigma    1.2000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
              erf_mean    1.5750e+00    1.5839e+00 +/-  2.03e-01  <none>
             erf_sigma    1.2000e+00    1.1959e+00 +/-  2.53e-01  <none>
             exp_slope    1.5000e+00    1.4979e+00 +/-  2.77e-02  <none>
             res_sigma    2.3000e-01    3.1002e-01 +/-  4.87e-01  <none>

They are identical, except for a fifth-digit difference in erf_mean, but I guess this is only a numeric instability because the models are defined a bit differently in code.

It works for me with any patch release of 6.26 and 6.24, and also in ROOT master. Only if I use a version older than 6.24, I get a mismatch like this:

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
              erf_mean    1.5750e+00    1.5838e+00 +/-  2.03e-01  <none>
             erf_sigma    1.2000e+00    1.1959e+00 +/-  2.53e-01  <none>
             exp_slope    1.5000e+00    1.4979e+00 +/-  2.77e-02  <none>
             res_sigma    2.3000e-01    3.1020e-01 +/-  4.87e-01  <none>


  RooFitResult: minimized FCN value: -14844, estimated distance to minimum: 1.50052e-06
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    2.0000e-01
               g1_mean    5.0000e+00
              g1_sigma    1.5000e+00
               g2_mean    5.5000e+00
              g2_sigma    1.2000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
              erf_mean    1.5750e+00    2.8284e+00 +/-  2.56e-04  <none>
             erf_sigma    1.2000e+00    1.0000e-02 +/-  6.58e-06  <none>
             exp_slope    1.5000e+00    1.0000e-02 +/-  2.83e-06  <none>
             res_sigma    2.3000e-01    1.0000e-02 +/-  1.99e-06  <none>

I don’t do anything except for connecting to lxplus, sourcing a given ROOT release and run the script I updated. Maybe there are some other things in your environment that could have an influence on the fit? Did you customize your .rootrc or .root_logon files, and you’re sure that 6.26 is sourced correctly? I hope we can still figure this out!

Cheers,
Jonas

Kovy64 · September 16, 2022, 8:59am

Hi Jonas,

I don’t have any sort of .rootrc or .root_logon files in my home directory, whatsoever… However, I tried around, and found the following pattern.

Case 1:
I ssh to lxplus, and setup root in this way:

source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.24.04/x86_64-centos7-gcc48-opt/bin/thisroot.sh

and your version of the script works!

Case 2:
I setup the recommended version of root with

setupATLAS
lsetup "root 6.26.04-x86_64-centos7-gcc11-opt""

In this case, the second fit (with RooProdPdf) fails.

Case 3:
Same as case 2, but with root version

lsetup "root 6.24.06-x86_64-centos7-gcc8-opt"

and it works again!

I’ve also tried ssh to lxplus8 and sourcing the 6.26.04 from cvmfs (similarly to case 1), as I only found it compiled for centos8 there, but again, the script fails…

I can’t quite make sense of this, looks a lot more delicate than I originally thought - I’m sure though you’ll have some ideas!

Cheers,

Ondra

jonas · September 16, 2022, 9:36am

Ah, actually I get the same fit failure when I source 6.26.04 from cvmfs!

So here is what I observer:

pre 6.24.00: fit failure with wrong fit result
6.24.00 to 6.26.02: fit succeeds
6.26.04: fit failure, but this time a different failure where also the NLL is zero for some reason
6.26.06: works again

The second problem that you observe in 6.26.04 is related to the RooAddPdf, because I fixed another bug related to it but the fix was wrong, hence the new bug in 6.26.04. Someone else already reported it then, so I could fix it in 6.26.06 again. So the bug you see is actually not related to the RoFFTConvPdf, but to the RooAddPdf.

So you should be able to use all ROOT releases including and after 6.24.00, except for 6.26.04.

Sorry for the inconvenience! I hope it’s not too much of a problem for you that you can’t use 6.26.04.

Cheers,
Jonas

system · September 30, 2022, 9:36am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

jonas · October 4, 2022, 3:06pm

I marked my previous post as the solution, because I think the situation with the RooAddPdf in the different ROOT versions is clear now. If you have further questions, feel free to open a new thread!