Fitting to a multidimensional dataset over different regions

Hi all,

I’m currently working on fitting to a multidimensional dataset and I’m running into some issues.

My data set is essentially a series of gaussians that each exponentially decay in time. I am interested in fitting to each of these gaussians separately. I’ve attached a script that produces an example dataset and then attempts to fit to it similar to the way that I am in my true script. I am particularly interested in doing this with the “data.reduce()” method, as it is similar to the way a simultaneous fit would be done and I do intend to extend this to simultaneous fitting later. I’m not sure what the issue is, but the fitting not only takes a long time, but also gives a poor result, if at all. My best guess is that while I am restricting the region of the fit on the x-axis, it is still trying to normalize the pdf across the entirety of the x range, rather than my restricted region.

Any help would be incredible as I have not been able to fix this problem no matter what I do!

Thanks

macro.py (4.4 KB)

Hi @abrac, happy to help with your fit!

There are several problems, besides the problem with the different regions.

  1. Maybe there is a problem with the mathematical model definition. You define your model as a RooProdPdf of the RooAddPdf(Gaussian, polynomial) and the exponential decay. But doesn’t the exponential decay only apply to your Gaussians, and not the background? That’s at least how you define the toy dataset. So I think you should swap the RooProdPdf with the RooAddPdf. In general, if you have multiple independent components like signal and backround, the RooAddPdf has to be at the end, like in this tutorial.

  2. The exponential decay constant needs to be negative if you want a decay and not an increase:

    decay_constant = ROOT.RooRealVar("decay_constant", "decay_constant", -1.0, -2.0, 0.0)
    
  3. Why use the Minuit minimizer, and not the default Minuit2? Is there a particular reason? Minuit2 should be better in general. If there is a particular fit that doesn’t converge with Minuit2, please open a GitHub issue about it.

  4. Your background in the toy data histogram and in the model becomes negative for high energy. You need to lower the absolute value for the slope constant a bit to avoid that.

  5. There are some places where you accidentally write range1 instead of range_1.

Now to the problem with the ranged fit: I also can’t get it to work. The ranged fits in RooFit are quite messy because there are many new knobs to tweak, like the SumCoefRange() command in RooAbsPdf::fitTo(). And even if the fit is successful, plotting is yet another story.

I would advise to avoid the ranged fit here, because it’s not strictly necessary. Why not define new RooRealVar objects for the subranges, build the models with these, and fit the datasets that you get with reduce()? Is that an option?

I have modified the script a bit to address these points. I hope this is a good starting point for your continued work:

macro_jonas.py (4.6 KB)

Cheers,
Jonas

Hi @jonas,

Thank you so much for your help! I realize I made some mistakes in generating the macro, I put it together somewhat hastily, sorry about that. As far as using Minuit goes, I just use it because my true data involves some roodecay gaussians which minuit seems to have an easier time working with.

I agree that it would probably be easier to define new RooRealVars for energy doing this fit, but I’m hoping to extend this to a simultaneous fit, where the yield of for example the third gaussian depends on that of the first. For that to work, their pdfs must rely on the same RooRealVar of energy, I believe, but please correct me if I’m wrong. So I’m trying to get these to work with this method on their own so that I can then have an easier time introducing the simultaneous fit later.

Do you know what I could do with including the SumCoefRange() in my fitTo call to get this fit to work properly?

Following up on this!

I guess my question doesn’t work very well since I’m only talking about a standard fit. I have produced a new macro to explain the issue I am encountering with simultaneous fitting. Essentially, the pdf that would describe the data in the first region cannot be extended beyond this region as it would go to zero or negative values outside of it. In my simultaneous fit, I need to be able to restrict the fitting region for pdf1 and pdf2 to their respective ranges. However, I encounter the issue of attempts to normalize the pdf outside of their range and therefore failing to get an appropriate fit. I understand that SplitRange() could be helpful here, but wouldn’t this require two categories rather than one? I’m a bit at a loss here, so sorry about that. I genuinely would appreciate any and all guidance!

Thanks!

(Edit: uploaded incorrect file originally. The correct one should now be attached!)

macro.py (4.2 KB)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.