Roofit: PDFs with different domain in the same model (e.g. gauss + lognormal)

Hi all,

I would like to ask how to make (and then fit to data) a Roofit model that combines model with different domain for the independent variable. For example a simple RooAddPdf between a RooLognormal, which is defined only for positive x, and a gaussian that is defined in the whole axis. Of course I am looking for the case when the gaussian is basically around x=0 and extends to negative x:

As you see I have no problem to make gauss + gauss, but as soon as I try to model the red component with the Roofit log-normal pdf (which should better describe the physical process in theory), I start to get complains that this component is attempted to be evaluated below zero, even if I restrict the fit to positive value (you notice in the picture that the blue line starts from 1.0 also in the 2 gaussian example).

I have no problem (well, after some learning to figure out some reasonable choice of the median and scale parameter) to plot, generate and fit a RooLognormal alone, it is really when I start to mix up with other distributions.

Looking at the code ( ROOT: roofit/roofit/src/RooLognormal.cxx Source File ) Roofit relies for the pdf evaluation on ROOT::Math::log_normal_pdf, which puts to zero the pdf for x<0 ( ROOT: math/mathcore/inc/Math/PdfFuncMathCore.h Source File ), but maybe Roofit intercepts this, because I understood that for the logL calculation it does not want zero probability.

So how do you deal with such cases? It seems fairly common to me.

I read that for the FFTConvPdf Roofit implements a shift of a pdf w.r.t. to the other. Can I try something on the same track? Maybe the gaussian depends on a different variable that is just shifted or so?

Any suggestion is really appreciated!

Hi @malfonsi79 , thanks for reporting your issue on the forum! Maybe @jonas can maybe help?

Hi all,

Meanwhile I discovered that what is giving problem is the analytical integration method provided, which does not support negative domain even though the evaluate method supports it.

In fact if I modify it with this workaround:

class AdHocLognormal : public RooLognormal
{
public:
  AdHocLognormal( const char *name, const char *title, 
                  RooAbsReal &_x, RooAbsReal &_m0, RooAbsReal &_k)
  : RooLognormal{name, title, _x, _m0, _k}
  { }
  AdHocLognormal(const AdHocLognormal &other, const char *name = nullptr)
  : RooLognormal{other, name}
  { }
  TObject *clone(const char *newname) const override { return new AdHocLognormal(*this, newname); }

  // Correct this calculation to allow for negative range extension
  double analyticalIntegral(Int_t code, const char *rangeName = nullptr) const override
  {
    static const double root2 = std::sqrt(2.);
        
    double ln_k = std::abs(_useStandardParametrization ? k : std::log(k));
    double scaledMax = _useStandardParametrization ? std::log(x.max(rangeName)) - m0 : std::log(x.max(rangeName) / m0);
    if ( x.min(rangeName) <= 0. )  // <--- here is the modification
      return 0.5 * (RooMath::erf(scaledMax / (root2 * ln_k)) + 1.);
    double scaledMin = _useStandardParametrization ? std::log(x.min(rangeName)) - m0 : std::log(x.min(rangeName) / m0);
    return 0.5 * (RooMath::erf(scaledMax / (root2 * ln_k)) - RooMath::erf(scaledMin / (root2 * ln_k)));
  }
};

I can have a functional lognormal distribution valid on the whole real axis, probability density is just zero in the negative half of the axis. Likewise, as soon as the range is at least partially positive, also the normalization can be calculated analytically, the negative fraction just adds 0 to the integral.

I verified that the code can (see picture below) plot the lognormal with a negative range (dashed blue), convolve (blue) it with the (dash green) gaussian, convolve it with a clone of itself (dash violet) and then also with the gaussian (violet). I can also combine all of them in a RooAddPdf (not in the plot, I can add it).

So now the question is: was there an oversight in the analytical integral code, or is there a specific theoretical/statistics reason why you want to forbid the definition over the full real axis? (You know, you get the warning when you define the Pdf that the “safe” range is ]0;+inf[ )

Thanks again for any feedback,
Matteo

Thanks a lot for the investigation! No, there is no good reason for that. In fact, it is inconsistent with other pdfs where the simulation is similar, like the RooLandau.

I’m fixing it and make sure that the fix also lands in the 6.38.02 patch release:

Cheers,
Jonas

Ok great I am happy that I have been useful.

Strictly related, also RooGamma yells about negative x although it returns 0 anyways. The source is TMath::GammaDist, which does that.

However I would like to ask for more help because a similar problem seems to occur again during the fitting procedure. It is a long post summarizing my attempts and investigation of the last weeks, but just before the Christmas holiday it is meant to be discussed (hopefully) only on next year when I am also back from holidays (I will make sure that it does not get closed in the first week of January)

I have build up a model composing either a set of log-normal distributions or a set gamma distributions (convoluted with resolution gaussian etc)

// Model for the case of log-normal peaks model
RooRealVar nbkg("nbkg","background events",300000,1.,1000000.) ;
RooRealVar nsig("nsig","signal events",15000,1.,100000.) ;
RooRealVar fdouble("fdouble","fraction of double to single events",0.02,0.001,0.1) ;
RooFormulaVar ndouble("ndouble","nsig*fdouble",{nsig,fdouble}) ;
RooRealVar fd1miss("fd1miss","fraction of d1 miss to single events",0.3,0.001,0.8) ;
RooFormulaVar nd1miss("nd1miss","nsig*fd1miss",{nsig,fd1miss}) ;
RooAddPdf model("model","model",
                RooArgList(gauss,lnxg,lnxlnxg,lognormal_u),
                RooArgList(nbkg,nsig,ndouble,nd1miss)) ;

From the “old 2009” RooFit manual, I understand that when the number of coefficients is the same of the number of component pdfs, the model is assumed an extended pdf model with coefficients used as events count rather than probability fractions. Therefore an extended likelihood is built for fitting procedures, which is fine, it is what I want.

I believe that I have set reasonable starting values:

and I have set small errors on parameters to ensure that Minuit does not immediately jumps away from here.

However any fit attempt

std::unique_ptr<RooFitResult> roofitres {
    model.fitTo(spe, RooFit::Save() , RooFit::Range( -0.5,5.5) )
};

fails with a long list of:

Returning maximum FCN so far (-inf) to force MIGRAD to back out of this region. Error log follows.
Parameter values: 	fd1miss=0.7	fdouble=0.05	medianln=1.35	nbkg=300000	nsig=50000	ratio_expectval_ln=0.5	sg=0.03	shapeln=1.3	shapeln_u=1.8
RooAbsPdf::lognormal_u_over_lognormal_u_Int[x][ numerator=lognormal_u denominator=lognormal_u_Int[x] ]
     p.d.f value is Not-a-Number @ numerator=lognormal_u=7.61465e-06, denominator=lognormal_u_Int[x]=0.999996   
     [ ... same lines ... ]
RooAddPdf::model[ nbkg * gauss_over_gauss_Int[x] + nsig * lnxg_over_lnxg_Int[x] + ndouble * lnxlnxg_over_lnxlnxg_Int[x] + nd1miss * lognormal_u_over_lognormal_u_Int[x] ]
     getLogVal() top-level p.d.f evaluates to NaN @ !refCoefNorm=(x = 7.5), !pdfs=(gauss_over_gauss_Int[x] = 0,lnxg_over_lnxg_Int[x] = 1.07732e-10,lnxlnxg_over_lnxlnxg_Int[x] = 1.8821e-07,lognormal_u_over_lognormal_u_Int[x] = 7.61469e-06), !coefficients=(nbkg = 300000 +/- 1000,nsig = 50000 +/- 1000,ndouble = 2500,nd1miss = 35000)
    [ ... same lines ... ]

My understanding of this log is that already with the starting parameters the NLL cannot be calculated, or maybe is log(0). Maybe lines 5-8 tells that the lognormal_uis the culprit, but I do not understand what it means with the numerator and denominator values (they seems valid numbers, so the ratio should be valid too). Also !refCoefNorm=(x = 7.5) seems to suggest that calculations are made over the full range, despite the requested fit range was up to 5.5 …

To understand what is happening I build and inspect directly the constructed NLL:

auto model_nll = model.createNLL(spe, RooFit::Range(-0.5,5.5), RooFit::Extended(true));
//I tried various ranges and the Extended option true and false, although from the manual I got that the default is Extended because the composite pdf is extended
model_nll->getVal()

The last line, that should give me the NLL at the current (starting) values of the parameters, produces the same errors.

However the model has valid and non-zero values over the full range used:

 for (double xval = -1. ; xval < 8. ; xval += 0.05) {
    x = xval;
    RooArgSet norm_set{x}; //not sure how to indicate a different x range here
    cout << model.getVal({x}) << "\t" << model.getLogVal(&norm_set) << endl;
}

So I have to understand how this binned log likelihood is calculated (continue on the next post)

(continues from previous post)

It is difficult to track down the piece of code that really calculates the NLL for my case, because, at the end, classes delegate the effective calculation to free functions in some implementation namespace and doxygen is not so good is creating the cross-links. There are classes named RooNLLVar and RooNLLVarNew but at the end they were not the used ones (they seems used for some other purpose.

My understanding is that createNLL() calls the specific createNLLImpl() of the pdf class. RooAddPdf does not override the basic behavior of RooAbsPdf, which call the free function RooFit::FitHelpers::createNLL() . [note older ROOT version - like on my old laptop, driving me crazy - were doing differently, but let’s focus on latest version].

Inspecting FitHelpers.cxx file (here is where doxygen fails cross-linking) you finds that this call builds the object RooFit::TestStatistics::RooRealL , which embed (and whose evaluate() uses) a RooFit::TestStatistics::RooBinnedL instance. This is, finally, the function where the log of probabilities are accumulated, see ROOT: roofit/roofitcore/src/TestStatistics/RooBinnedL.cxx Source File

Here the only special error message is NOT the message that I receive, so the error should come really from the evaluation of the pdf - note that the pdf is not evaluated in log space. But this is in contradiction with my last check in the previous post, where I have verified that the composite pdf is positive over the full range (I repeated the test with a step smaller that the histogram bin width).

After the holidays I will investigate also on my own RooAddPdf , I am not sure to have fully understood how these coefficients are managed.

Merry Christmas and Happy discussion on Roofit in the New Year :wink:

Matteo