Fit multiple peaks with shared parameters

MarieD · January 3, 2024, 2:31pm

I have a spectrum, with 2 (or more) distant peaks. I want to fit all of them with a gaussian, where the sigma is constrained by a*energy + b, here a and b should be optimized simultaneously. And I want each of the peaks to have a separate constant background (i.e. a constant that is only nonzero in the vicinity of the peakregion).

For the moment, I use:

RooRealVar a_sig("a_sig","a of sigma relation", 0.00084, 0, 0.1,"keV");
RooRealVar b_sig("b_sig","b of sigma relation", 0.1, 0, 1,"keV");`

And then I make multiple ranges:

E.setRange(range_name.str().c_str(),downenergy, upenergy);

Then I make the background:

constback[ii] =  new RooRealVar(constback_name.str().c_str()," ",
                                15, 0, 1000,"keV");
constback_touse[ii] = 
   new RooFormulaVar(constback_touse_name.str().c_str(),
                     "(@0 > (@2-20))*(@0 < (@3+20)) *@1",
                     {*mx_const[ii],*constback[ii], *Roodown[ii], *Rooup[ii]});

where Rooup and down is just boundaries for the where I want the background.

And I define my model for each of the peaks as:

Sig[ii] =
new RooGenericPdf(sig_name.str().c_str(),
                  "(exp(-((@0 - @1)/(@2)) * ((@0 - @1)/@2)/2) /(@2*2.5) + @3)",
                  {E,*mx[ii],*sigma[ii], *constback_touse[ii]});
eSig[ii] = new RooExtendPdf(eSig_name.str().c_str()," ",*Sig[ii],*nSig[ii]);

Then the fit is performed as:

RooArgList pdfList{"pdfList"};
for(int i = 0; i < numPeaks; i++) {      
   pdfList.add(*(eSig[i]));     
}
RooAddPdf finalPDF("finalPDF","finalPDF",pdfList);
RooNLLVar nll("nll", "nll", finalPDF, data,
              Range(all_ranges.str().c_str()), Extended(kTRUE), NumCPU(100));
RooMinimizer m(nll) ;
m.migrad() ;
m.hesse() ;
RooFitResult* r_sig = m.save() ;

I have 3 problems, the first one is, that when I use:

finalPDF.plotOn(frame, RooFit::NormRange(all_ranges.str().c_str()));

the background is clearly not used the way I want, I would expect a jump at Rooup[ii] +20, with Rooup = 855, but the background is just straight.

Moreover, I noticed that, the larger I make the interval for E (i.e. increasing the 1185 or decreasing the 830 in this definition: RooRealVar E("E","Energy1",830, 1185,"keV");), the worse the fit becomes. Even if I only fit one peak, it looks like less datapoints are taken into account?

Lastly, making this fit is taking >30min, while if I do not add the clipping, it is done in about a minute, which is rather weird to me.

moneta · January 9, 2024, 3:44pm

Hello,

Can you please post your full macro and the data to reproduce the problem?

Lorenzo

jonas · January 9, 2024, 4:11pm

Hi @MarieD,

while a full reproducer would be nice, I have some idea how to approach this based on the RooParametricStepFunc. Your approach doesn’t work for several reasons, which is why it was simpler in this answer to just create an example from scratch that you can adopt if you want.

Here is a code example:

using namespace RooFit;

// Define some constant parameters
const double xMin = 100.0;
const double xMax = 1000.0;
const std::size_t nEvents = 1000000;

// Fill the bin boundaries. Some bins I used for the gaps between the fit
// regions.
std::vector<double> binBoundaries{120, 400, 500, 900};

// The RooParametricStepFunction needs a TArrayD
TArrayD binBoundariesTArr{int(binBoundaries.size()),
                          binBoundaries.data()};

// The observable with the ranges around the peak
RooRealVar x{"x", "x", xMin, xMax};
x.setBins(200);
x.setRange("r1", 120, 400);
x.setRange("r2", 500, 900);

// Total number of background
RooRealVar nbkg{"nbkg", "bkg", 1000000, 0, 1e9};

// resolution parameters
RooRealVar a{"a", "a", 0.07, 0.01, 1.0};
RooRealVar b{"b", "b", 2, 0.0, 10.0};
RooFormulaVar res{"res", "a * x + b", {a, x, b}};

// Build the model
RooRealVar mu1{"mu1", "mu1", 200, 100, 1000};
RooRealVar mu2{"mu2", "mu2", 700, 100, 1000};
RooGaussian sig1{"sig1", "sig1", x, mu1, res};
RooGaussian sig2{"sig2", "sig2", x, mu2, res};

// Since the sigma is correlated with the observable,
// there is not simple analytical integral of the
// Gaussian anymore. Because of a bug, RooFit doesn't
// notice it itself, so we need to force the numeric
// integration of the Gaussians.
sig1.forceNumInt(true);
sig2.forceNumInt(true);

RooRealVar nsig1{"nsig1", "nsig1", 100000, 0, 1e9};
RooRealVar nsig2{"nsig2", "nsig2", 100000, 0, 1e9};

// Scaling coeff for the RooParametricStepFunction
RooRealVar bkg1{"bkg1", "bkg1", 0.001, 0, 20};
// The second bin in the step function is a gap,
// so constant zero.
// We only need n - 1 coefficients, because the final
// one is determined by the noamlization condition.
RooArgList coefList{bkg1, RooConst(0.0)};
RooParametricStepFunction bkg("bkg",
                              "bkg",
                              x,
                              coefList,
                              binBoundariesTArr,
                              binBoundaries.size() - 1);

RooAddPdf model{
    "model", "model", {sig1, sig2, bkg}, {nsig1, nsig2, nbkg}};

std::unique_ptr<RooAbsData> data{model.generateBinned(x)};

std::unique_ptr<RooFitResult> result{
    model.fitTo(*data, PrintLevel(-1), Save(), Range("r1,r2"))};

result->Print();

auto c1 = new TCanvas();

RooPlot *xframe = x.frame();
data->plotOn(xframe);
model.plotOn(xframe);
xframe->Draw();

c1->SaveAs("plot.png");

I know it’s quite complicated, so if you have any follow up questions, feel free!

The output plot looks like this:

And about your question with the fitting time: don’t use NumCPU(100). The overhead of communicating between all the processes will be very high and the fit will be very slow. Use 4 or 8 cores at a maximum, and even then it is not certain that it will be faster than without the NumCPU() option, so you should also try that.

Another way to speed things up is to avoid the Range() option, because this makes things quite slow because many normalization integrals need to be figured out by RooFit. You would also just fit the full range, and prepare your dataset and pdfs maybe in a way that they are always zero in the “gaps”.

I hope these ideas help you further!

Cheers,
Jonas

MarieD · January 10, 2024, 8:04am

Dear @jonas , thanks a lot for the help, I really appreciate it.
However, I am a bit confused how this code allows for two different backgrounds, i.e in the range [120,400] the background is ~4600, while in [500, 900], the background is ~8000. However, for the background, there is only bkg1 and nbkg as parameters, so I do not see how these different backgrounds are generated and what the corresponding parameters are.
In other words, why has coefList only 2 elements and not 3 or 4, namely {bkg1,0,bkg2,0}?

Thanks in advance,
Cheers Marie

jonas · January 10, 2024, 9:12am

Hi @MarieD,

that’s a good question, I didn’t go into much detail in the original answer because I didn’t know if you actually care about the background parameters and uncertainties. So you want to know the number of background +/- uncertainty in each of these regions?

The reason why in my setup the RooParametricStepFunction has only two parameters is the following:
it’s bin boundaries are defined excluding the empty regions on the left and the right, so we have 3 bins in my example: the first peak, the “gap”, and the second peak. The RooParametricStepFunction is implementing a RooAbsPdf, so it normalizes itself. Because of this normalization condition, you only need 2 coefficients to fully define it for 3 bins.

Then, this RooParametricStepFunction is used in a RooAddPdf, where each component needs a coefficient that corresponds to a number of events, such that the total number of events can be used in an extended fit. So the nbkg parameter is the easiest to interpret: it’s the total number of background events. The coefficients of the step function are a bit more complicated to interpret, because they aren’t number of events but probability densities. To get the total number of events, you would have to multiply them by the range width and then by the total number of events. So for example, in the first region the number of background events in the example is bkg1 * (400 - 120) * nbkg. The number of events in the last region is just nbkg minus the number of events in all the other regions.

Does that make things a little bit more clear?

MarieD · January 10, 2024, 9:34am

Dear @jonas , it does, thanks a lot for the help!

MarieD · January 10, 2024, 9:47am

Dear @jonas , would this code also work for a non constant background, e.g. a linear background? Could I just redefine bkg1 as a RooFormulaVar? because then, for the background of the second peak, it would just be a constant if I understand well?

jonas · January 10, 2024, 9:59am

Yes, you can also use other pdfs for the background, but not a RooFormulaVar because it it not a pdf. You’d have to use a RooGenericPdf, which is the pdf equivalent of a RooFormulaVar. Maybe that’s better than my approach with the step function, because it’s more flexible.

MarieD · January 10, 2024, 10:06am

Do I then add this RooGenericPdf to the coefList?

jonas · January 10, 2024, 10:15am

Nono, the RooGenericPdf would replace the RooParametricStepFunction in the RooAddPdf. The coefList would cease to exist as such, and instead you’d have a list of variables that get used in the formula of your RooGenericPdf. You can find examples here:

https://root.cern.ch/doc/master/rf103__interprfuncs_8C.html

MarieD · January 10, 2024, 10:16am

But how does it force it to be zero then, outside the range?

jonas · January 10, 2024, 10:26am

Well, the formula in your RooGenericPdf will have some more complicated logic. You can do conditionals with the C++ ternary operator inline in the formula.

But maybe it’s better to write your background shape as a C++ function, declare this to ROOT and then use it in the RooGenericPdf. That’s what I would do, and I recently wrote about this approach here:

github.com/cms-analysis/HiggsAnalysis-CombinedLimit

Comment by guitargeek to PDFs for B2G-23-008 analysis

cms-analysis:main ← anmalara:dev

Hello @anmalara! I have tome questions and suggestions about this: why not us…e `RooGenericPdf` for simple expression pdfs like this? Adding simple custom pdfs that only implement `evaluate()` anyway has several disadvantages: * It's a big commitment for backwards compatibility! With the `ClassDef(ExpGaussExp,1)`, you make your class serializable, meaning people can workspaces with it in a file. That puts a burden on the combine maintainers, because in principle these classes can never be changed or removed without causing possible problems because people can't open their workspace * By only implementing the `evaluate` function, and not other RooFit interfaces like for the vectorized/GPU evaluation or [C++ code generation](https://root.cern.ch/doc/master/group__roofit__dev__docs__ad.html) from the model, you are not benefiting from the new stuff that RooFit has to offer * It's just a lot of duplicated code Maybe an alternative approach to this could be to simply define a header with free inlined functions to use in RooGenericPdfs? That would be a slimmer and more future-proof solution, and it could even be inspected by compiler plugins for automatic differentiation like Clad because it's in the headers. E.g.: ```c++ #ifndef customPdfs_h #define customPdfs_h inline double expGaussExp(double x, double p0, double p1, double p2, double p3) { Double_t std=(x-p0)/p1; Double_t result=0; if (std<-p2){ result=exp(p2*p2/2.+p2*std); } else if (-p2<=std && std<p3){ result=exp(-0.5*pow(std, 2)); } else { result=exp(p3*p3/2.-p3*std); } return result; } #endif // customPdfs_h ``` Then it could be used like this C++: ```c++ #include <TInterpreter.h> #include <RooGenericPdf.h> #include <RooRealVar.h> #include <RooWorkspace.h> void initCustomPdfs() { static bool hasInit = false; if(hasInit) return; gInterpreter->ProcessLine("#include \"customPdfs.h\""); hasInit = true; } auto expGaussExpFormula = "expGaussExp(x[0], x[1], x[2], x[3], x[4])"; void example() { initCustomPdfs(); RooRealVar x{"x", "x", 1.0, 0.0, 10.0}; RooRealVar p0{"p0", "p0", 1.0, 0.0, 10.0}; RooRealVar p1{"p1", "p1", 1.0, 0.0, 10.0}; RooRealVar p2{"p2", "p2", 1.0, 0.0, 10.0}; RooRealVar p3{"p3", "p3", 1.0, 0.0, 10.0}; RooGenericPdf expGaussExp{"expGaussExp", expGaussExpFormula, {x, p0, p1, p2, p3}}; expGaussExp.Print(); } ``` That would at least be my suggestion from my role as the RooFit maintainer :) In ROOT, we put quite a bit an emphasis on the interfaces where users pass string formulas, because that just very flexible (see also RDataFrame). Sorry for chiming in as a non-CMS member, but I think it's important to know there are superior and more maintainable alternatives. I'm also very much welcoming other ideas and suggestions on how to make the `RooGenericPdf/RooFormulaVar` and friends work better!

MarieD · January 10, 2024, 10:28am

ok, thanks a lot for all of the help!

system · January 24, 2024, 10:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.