RooFit convolution + constraints Likelihood fit

woywoy123 · January 11, 2021, 10:12am

Hello Everyone!

I am trying to convolve a PDF (some input data distribution) with a RooGaussian but with the sigma and mean being constrained. I have managed to create a simple PDF x Gaussian fit which works well but I cant seem to find a way to constrain the Gaussian parameters. This is a small segment of the code I am trying to work with:

std::vector<std::pair<TH1F*, std::vector<float>>> FitDeconvolutionConstraints(TH1F* Data, std::vector<TH1F*> PDF_H, std::map<TString, std::vector<float>> Params, int fft_cache, int cache)
{
  // First we get the domain of the Data histogram we are fitting 
  float x_min = Data -> GetXaxis() -> GetXmin(); 
  float x_max = Data -> GetXaxis() -> GetXmax(); 
  int bins = Data -> GetNbinsX(); 
  int n_vars = PDF_H.size(); 
 
  // Input variables;  
  // Standard Deviation 
  std::vector<float> s_s = Params["s_s"]; // s_s = standarddev_start
  std::vector<float> s_e = Params["s_e"];  // s_e = standarddev_end
  std::vector<TString> s_N = NameGenerator(n_vars, "_s"); 
  
  // Mean 
  std::vector<float> m_s = Params["m_s"]; // m_s = mean_start
  std::vector<float> m_e = Params["m_e"]; // m_e = mean_end
  std::vector<TString> m_N = NameGenerator(n_vars, "_m"); 

  // Declare the domain using RooVariables 
  RooRealVar* x =  new RooRealVar("x", "x", x_min, x_max);
  x -> setRange("fit", Params["x_range"][0], Params["x_range"][1]); 
  if (fft_cache != 0){x -> setBins(fft_cache, "fft");}
  if (cache != 0){ x -> setBins(cache, "cache");}

  // Create the unconstrained variables
  std::vector<RooRealVar*> mean = RooVariables(m_N, m_s, m_e); 
  std::vector<RooRealVar*> stdev = RooVariables(s_N, s_s, s_e); 
  
  // Create the Gaussian Models 
  std::vector<TString> g_N = NameGenerator(s_N.size(), "_GxT"); 
  std::vector<RooGaussian*> gaus = RooGaussianVariables(g_N, x, mean, stdev); 

  // Constrain these variables under Gaussian conditions 
  std::vector<RooGaussian*> mean_constr = RooGaussianConstraint(m_N, mean, Params["m_init"], Params["m_res"]); 
  std::vector<RooGaussian*> stdev_constr = RooGaussianConstraint(s_N, stdev, Params["s_init"], Params["s_res"]); 
  
  // Convert the PDFs to RooPDFs
  std::vector<TString> pdf_N = NameGenerator(s_N.size(), "");   
  std::vector<RooHistPdf*> pdf_vars = RooPDFVariables(pdf_N, x, PDF_H); 
    
  // Create the Convolution PDFs 
  std::vector<RooFFTConvPdf*> fft_vars = RooFFTVariables(pdf_N, x, pdf_vars, gaus);
  
  // Create the Luminosity variables for the fit
  std::vector<TString> l_N = NameGenerator(s_N.size(), "_L");  // <--- simply creates names for the variables
  std::vector<float> l_s(s_N.size(), 0); 
  std::vector<float> l_e(s_N.size(), 1. * Data -> Integral());  
  std::vector<RooRealVar*> l_vars = RooVariables(l_N, l_s, l_e); 

  // Combine the variables into a single ArgSet and create the model
  RooArgSet Conv; 
  RooArgSet N; 
  for (int i(0); i < s_N.size(); i++)
  {
    Conv.add(*fft_vars[i]); 
    N.add(*l_vars[i]); 
    N.add(*mean_constr[i]);
    N.add(*stdev_constr[i]);
  }
  RooAddPdf model("model", "model", Conv, N); // < ---- this gives an error because I have only 4 PDFs but I am trying to fit; the Mean, Stdev and Normalization (4+4+4 = 12 parameters)
}

// ===================== Associated RooFit functions =========================== //

std::vector<RooGaussian*> RooGaussianConstraint(std::vector<TString> Names, std::vector<RooRealVar*> Variables, std::vector<float> InitialPred, std::vector<float> Resolution)
{
  std::vector<RooGaussian*> Out; 
  for (int i(0); i < Variables.size(); i++)
  {
    RooGaussian* v_constrain = new RooGaussian(Names[i] + "_Con", Names[i] + "_Con", *Variables[i], RooFit::RooConst(InitialPred[i]), RooFit::RooConst(Resolution[i]));
    Out.push_back(v_constrain); 
  }
  return Out; 
}

std::vector<RooFFTConvPdf*> RooFFTVariables(std::vector<TString> Names, RooRealVar* domain, std::vector<RooHistPdf*> PDFs, std::vector<RooGaussian*> Gaus) 
{
  std::vector<RooFFTConvPdf*> Out; 
  for (int i(0); i < Names.size(); i++)
  {
    RooFFTConvPdf* p = new RooFFTConvPdf(Names[i] + "pxg", Names[i] + "pxg", *domain, *PDFs[i], *Gaus[i]); 
    Out.push_back(p); 
  }
  return Out; 
}

std::vector<RooHistPdf*> RooPDFVariables(std::vector<TString> names, RooRealVar* domain, std::vector<TH1F*> Hist)
{
  std::vector<RooHistPdf*> Out; 
  for (int i(0); i < names.size(); i++)
  {
    RooDataHist* H = RooDataVariable(names[i], domain, Hist[i]); 
    RooHistPdf* P = new RooHistPdf(names[i]+"p", names[i] + "p", *domain, *H); 
    Out.push_back(P);  
  }
  return Out; 
}

std::vector<RooRealVar*> RooVariables(std::vector<TString> Names, std::vector<float> Begin, std::vector<float> End)
{
  std::vector<RooRealVar*> Variables; 
  for (int i(0); i < Names.size(); i++)
  {
    RooRealVar* r = new RooRealVar(Names[i], Names[i], Begin[i], End[i]); 
    Variables.push_back(r); 
  }
  return Variables; 
}

std::vector<RooGaussian*> RooGaussianVariables(std::vector<TString> Names, RooRealVar* domain, std::vector<RooRealVar*> mean, std::vector<RooRealVar*> stdev)
{
  std::vector<RooGaussian*> Variables; 
  for (int i(0); i < Names.size(); i++)
  {
    RooGaussian* g = new RooGaussian(Names[i], Names[i], *domain, *mean[i], *stdev[i]);  
    Variables.push_back(g); 
  }
  return Variables; 
}

RooDataHist* RooDataVariable(TString Name, RooRealVar* domain, TH1F* Hist)
{
  RooDataHist* H = new RooDataHist(Name, Name, *domain, RooFit::Import(*Hist)); 
  return H;
}

For a bit more context, I am deconvolving a particular data distribution with a Gaussian and reconvolving it with a Gaussian that I want RooFit to find the ideal parameters for. Also for some reference I am following this tutorial https://root.cern/doc/v610/rf604__constraints_8C_source.html.

Any help would be greatly appreciated.

jblomer · January 12, 2021, 1:05pm

@moneta Perhaps you can give a suggestion?

moneta · January 13, 2021, 4:24pm

Hi,

Can you please make from the extract of the code, a macro that can be run and tell us exactly what the problem is ? It is not fully clear to me looking at your code

Lorenzo

woywoy123 · January 19, 2021, 1:43pm

Debug.C (12.7 KB)

Hello Lorenzo,

Sorry for taking so long to reply. I have taken out pieces out of my main code and compiled them into this debug code. So basically what I am trying to achieve is to deconvolve a Landau with a Gaussian and then get RooFit to revert this deconvolution by giving me the appropriate Gaussian parameters.

This does work in this ideal case, but I was wondering if the parameters (mean and standard deviation for instance) could be constrained with a Gaussian penalty weighting. So what I mean by that is, if I guess the mean to be 0 and I ask RooFit to find the mean any deviation away from my guess would have a penalty. I hope this is somewhat clearer.

Thomas

moneta · January 28, 2021, 5:55pm

Hi,

Sorry for my late reply. You would like to do a deconvolution (inverse convolution) of
LandauXGaussian. Deconvolution is an inverse operation and it can be problematic.
There you might need to put constrained on the parameters. The direct convolution, that you can do then in RooFit should be fine.

Lorenzo

woywoy123 · January 30, 2021, 8:47am

Hello Lorenzo,

The part about constraining is where I am moderately stuck on. I am guessing in the case for a convolution fit, I would need to use an external constraint for the fit since, my the Landau does not explicitly depend on the Gaussian parameters. Is that correct?

If so, then does that mean I simply create constrained variables like below:

std::vector<RooGaussian*> RooGaussianConstraint(std::vector<TString> Names, std::vector<RooRealVar*> Variables, std::vector<float> InitialPred, std::vector<float> Resolution)
{
  std::vector<RooGaussian*> Out; 
  for (int i(0); i < Variables.size(); i++)
  {
    RooGaussian* v_constrain = new RooGaussian(Names[i] + "_Con", Names[i] + "_Con", *Variables[i], RooFit::RooConst(InitialPred[i]), RooFit::RooConst(Resolution[i]));
    Out.push_back(v_constrain); 
  }
  return Out; 
}

And then call the external constraint command during model.fitTo?

Thanks,
Thomas

system · February 13, 2021, 8:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

moneta · February 17, 2021, 6:07pm

Hi,

Sorry for the late reply. What you show seems to me correct for applying constraints in a fit. You can also just multiply the model pdf with the constraint pdf’s.
See the tutorial ROOT: tutorials/roofit/rf604_constraints.C File Reference on the various possibilities to apply constraint in a fit.

Cheers

Lorenzo

woywoy123 · February 18, 2021, 1:15pm

Hello Lorenzo,

Thank you for your response. The tutorial you have referenced is what I have been using to guide my implementation in my project. I have attached two standalone macros where, one uses the Numerical Convolution engine and the other the FFT approach. When running either of these scripts, you will notice from the output below that, for an internal constraint the parameters diverge towards the limits of the specified ranges. For the case with and without the external constraint, the fit converges normally. I am not sure if I understand the reasoning behind this.

Thanks,
Thomas

Using ROOT 6.20/06
Output of rf604_constraints.C;

fit result without constraint (data generated at mean=0.5)

  RooFitResult: minimized FCN value: 27900.4, estimated distance to minimum: 1.8432e-08
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
                     s    2.0000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    5.0000e-01    4.5119e-01 +/-  4.02e-02  <none>

fit result without constraint FFT (data generated at mean=0.5)

  RooFitResult: minimized FCN value: 28037.9, estimated distance to minimum: 3.32899
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=-1 HESSE=4 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
              mean_FFT    0.0000e+00
                     s    2.0000e+00
             sigma_FFT    1.0000e-02

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5119e-01    1.3612e+00 +/-  2.33e-03  <none>

fit result with internal constraint (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 27899.1, estimated distance to minimum: 2.88501e-07
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
                     s    2.0000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    1.3612e+00    4.5801e-01 +/-  3.73e-02  <none>

fit result with internal constraint  FFT (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 36887.4, estimated distance to minimum: 9.60673e-06
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
              mean_FFT    0.0000e+00
                     s    2.0000e+00
             sigma_FFT    1.0000e-02

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5801e-01    4.9969e-01 +/-  1.00e-01  <none>

fit result with (another) external constraint (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 27899.1, estimated distance to minimum: 1.02286e-08
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
                     s    2.0000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.9969e-01    4.5799e-01 +/-  3.73e-02  <none>

fit result with (another) external constraint FFT (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 36887.4, estimated distance to minimum: 9.62142e-06
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
              mean_FFT    0.0000e+00
                     s    2.0000e+00
             sigma_FFT    1.0000e-02

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5799e-01    4.9969e-01 +/-  1.00e-01  <none>

Output of rf604_constraints_fft.C

fit result without constraint (data generated at mean=0.5)

  RooFitResult: minimized FCN value: 27900.4, estimated distance to minimum: 1.8432e-08
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
                     s    2.0000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    5.0000e-01    4.5119e-01 +/-  4.02e-02  <none>

fit result without constraint FFT (data generated at mean=0.5)

  RooFitResult: minimized FCN value: 27900.4, estimated distance to minimum: 9.35064e-10
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
              mean_FFT    0.0000e+00
                     s    2.0000e+00
             sigma_FFT    1.0000e-02

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5119e-01    4.5128e-01 +/-  4.02e-02  <none>

fit result with internal constraint (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 27899.1, estimated distance to minimum: 3.05208e-08
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
                     s    2.0000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5128e-01    4.5798e-01 +/-  3.73e-02  <none>

fit result with internal constraint  FFT (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 23621.5, estimated distance to minimum: 1.17705e-11
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
              mean_FFT    0.0000e+00
                     s    2.0000e+00
             sigma_FFT    1.0000e-02

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5798e-01    2.0000e+00 +/-  1.70e-04  <none>

fit result with (another) external constraint (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 27899.1, estimated distance to minimum: 7.50741e-06
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
                     s    2.0000e+00

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    2.0000e+00    4.5788e-01 +/-  3.73e-02  <none>

fit result with (another) external constraint FFT (data generated at mean=0.5, constraint is mean=0.5+/-0.1)

  RooFitResult: minimized FCN value: 27899.1, estimated distance to minimum: 8.40599e-10
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                     f    5.0000e-01
              mean_FFT    0.0000e+00
                     s    2.0000e+00
             sigma_FFT    1.0000e-02

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    4.5788e-01    4.5807e-01 +/-  3.73e-02  <none>

rf604_constraints.C (4.3 KB) rf604_constraints_fft.C (4.3 KB)

DmitryYuGolubkov · February 24, 2021, 8:20pm

Hi,

it seems that to fix the numeric convolution fits it is enough to change the setConvolutionWindow() parameter so that they correspond
to the ones used by the resolution function:

    // PDFxG.setConvolutionWindow(m, s, 5);
    PDFxG.setConvolutionWindow(mean_FFT, sigma_FFT, 5);

regarding the problematic FFT convolution with internally-constrained fit, I’ve ran into a similar issue recently, and AFAIU it occurs due some numerical problems
related to the fact that the FFT convolution returns scaled pdf, with ‘raw’ (un-normalized) values easily reaching the range 10^8-10^12 or higher
(the order of magnitude strongly depends on the binning in x).
(This PDF scaling is mentioned in a comment in the FFT tutorial https://root.cern/doc/master/FFT_8C.html
and also in the FAQ of the underlying FFTW library http://www.fftw.org/faq/section3.html#whyscaled ).

In my case the fits of convoluted PDF with internal constraints were very unstable, while fits of the same PDF with external constraints behaved well.
As far as I could trace it down, there exists some difference at least in the order of calculation of the log-likelihood
between the externally-constrained and internally-constrained fits. In case of the external constraints,
the logarithm of the PDF and the logs of the constraint terms are calculated separately and then their sum is maximized.
Whereas in case of using internal constraints, the log is taken of the total RooProdPdf containing both the signal PDF and the constraint terms.
Normally RooFit automagically normalizes the PDFs when it decides to do so.
Possibly (but here I’m not at all sure - one might have to study the dependency tree of the resulting RooProdPdf) in case of the
internal constraints the automatic normalization of the PDFs is done after the multiplication of the signal pdf term by the constraints,
and that somehow leads to loss of precision.

My solution for this problem was to introduce a compiled ‘proxy’ pdf which just returns the normalized convoluted signal PFD, e.g.:

RooAbsPdf *PDFxG_norm = RooClassFactory::makePdfInstance("qq", "PDFxG.arg().getVal( RooArgSet( x.arg() ) )", RooArgSet(PDFxG, x), "x:1");

rf604_constraints_fft_norm.C (5.4 KB)
See the attached script.
( I’m using v6.20.02 ROOT version - in the newer versions the syntax may not require calling varname.arg() to access getVal(). )

By the way, apparently, similar problem has been reported earlier in the following thread:
https://root-forum.cern.ch/t/problem-using-convolution-with-roofftconvpdf/21517/2

moneta · February 25, 2021, 8:21am

Hi,
One thing noting on this. There is an open bug affecting the RooFFTConvPdf, see https://sft.its.cern.ch/jira/projects/ROOT/issues/ROOT-8497
A patch exists as a PR, [RF] Fix in RooAbsArg::optimizeCache for nodes with same name by lmoneta · Pull Request #7280 · root-project/root · GitHub.
This bug could probably explain some of the differences observed when using the RooFFTConvPdf

Lorenzo

eguiraud · March 1, 2021, 5:00pm

This topic was automatically closed after 14 days. New replies are no longer allowed.