Roofit time consuming Range change

Nicola_Rubini · September 11, 2020, 7:45am

Hi everyone!

I am currently performing multiple fits over the same dataset but asking for a different range using the option Range(“customrange”).
Ex.
1^ Fit [0,1]
2^ Fit [0,0.9]
3^ Fit [0.1,1] etc.
I was wondering if it’s normal to experience way longer computation times with respect to the usual full range?

couet · September 11, 2020, 8:14am

I guess @moneta can help.

StephanH · September 11, 2020, 8:29am

I have not seen something like this before. Could it be that your model needs very expensive numeric integrals?
What kind of model are you using?

moneta · September 11, 2020, 8:29am

Hi,
I would need to have your code to understand the issue precisely. However it is possible that when setting fit range it takes longer, since you would need to compute integrals in the full range for normalisation and integrals in the restricted range in some cases, for example for defining coefficients (e…g. number of signal events, number of background events) in additions of PDF’s.
If computing the integral is taking much longer than evaluating the PDF, that could explain what you observe

Best,

Lorenzo

Nicola_Rubini · September 11, 2020, 8:50am

This is the part where I change the Range:

    varx.setRange("Full",fMinIM1D,fMaxIM1D);
    varx.setRange("1",   fMinIM1D,  1.045);
[...]
    vary.setRange("8",   1.000,     1.040);
    RooFitResult* FitResults;
    if ( fRangb )       FitResults = fMod.fitTo(*data,Extended(kTRUE),SumW2Error(kTRUE),Save(),Range("1,2,3,4,5,etc."));
    else                FitResults = fMod.fitTo(*data,Extended(kTRUE),SumW2Error(kTRUE),Save(),Range("Full"));

What I see is an increase in the order of 100 fold, it is puzzling to me that such a huge increase could be due to integrals computing.

StephanH · September 11, 2020, 9:25am

I added ``` around the code block, so it’s easier to read.

So indeed, you are fitting in multiple ranges. You didn’t say which kind of model you use, but this really looks like expensive numeric integrals to me. For a set up like this:

fMod.fitTo(data, Range("1"))

You need at least two integrals (if something like a sum PDF is used):

One integral over the full range, to get the total integral of the PDF
One integral over the range 1, to get the fraction of the total integral that the fit has to look at.

If you do Range("1,2"), we already need 3 integrals, one for full, and then the two sub ranges, and it goes on the more ranges you add.
Things get worse if you have things like N-D distributions or convolutions in the model, because you have to run projections (=numeric integrals) over those distributions.

To really say what consumes so much time, you have to add some details about the model.

Nicola_Rubini · September 11, 2020, 10:27am

    RooChebychev        fBkgx ("fBkgx","fBkgx"          ,varx,RooArgSet(ch0x,ch1x,ch2x,ch3x,ch4x,ch5x));
    RooVoigtian         fSigx ("fSigx","fSigx"          ,varx,pMassx,pWidthx,pSlopex);
    RooChebychev        fBkgy ("fBkgy","fBkgy"          ,vary,RooArgSet(ch0y,ch1y,ch2y,ch3y,ch4y,ch5x));
    RooVoigtian         fSigy ("fSigy","fSigy"          ,vary,pMassy,pWidthy,pSlopey);
    RooProdPdf          fBB   ("fBkg","fBkg"            ,fBkgx,fBkgy);
    RooProdPdf          fSB   ("fSigBkg","fSBWBkg"      ,fSigx,fBkgy);
    RooProdPdf          fBS   ("fBkgSig","fBkgSig"      ,fBkgx,fSigy);
    RooProdPdf          fSS   ("fSigSig","fSigSig"      ,fSigx,fSigy);
    RooAddPdf           fMod  ("fMod2D","fMod2D"        ,RooArgList(fBB,fSS,fSB,fBS),RooArgList(n1,n0,n3,n2));

It’s a two-dimensional Fit for an invariant mass distribution. The exact expression for the fitTo is

if ( fRangb )       FitResults = fMod.fitTo(*data,Extended(kTRUE),SumW2Error(kTRUE),Save(),Range(fRangs.c_str()));
else                FitResults = fMod.fitTo(*data,Extended(kTRUE),SumW2Error(kTRUE),Save(),Range("Full"));

I do this in a for cycle calling a function that performs the fit one time in a different range if fRangb is true in the range defined by the string fRangs.c_str()

Edit: Thanks for the formatting!

StephanH · September 11, 2020, 11:55am

Ok, so I believe this it what happens:

Due to the N ranges, we need N+1 integrals to sort out the coefficients of the AddPdf.
The four productPdfs, in turn, need to integrate their factors. That’s two integrals per product PDF.
The Chebychev can be integrated analytically, but it’s a loop that runs longer if you have more coefficients. It should still be fast enough, though.
The Voigtian needs to be integrated numerically.

So we should be running about
N+1 * 4 * 2 integrals for one evaluation.
Usually, those integrals are cached, so they run only when a parameter changes. Now, however, when the range also changes, they might rerun for every event in the dataset.
I would have to check in detail if you could post runnable code, but that’s my best guess.

One thing that might save you is the new RooFit::BatchMode() in fitTo(). Instead of evaluating each event separately, it computes likelihoods for all events in one go. That will hopefully dramatically reduce the number of integrals needed. I hope your ROOT version is recent enough, so that it’s available. Do you want to give it a try?

Nicola_Rubini · September 11, 2020, 5:54pm

After trying I get a wall of messages exactly identical:

Retrieving weights in batches not yet implemented for RooDataHist.

Edit: I am using

gROOT->SetBatch();

At the start of the process and it already sped up quite a lot the fit

Edit 2:
I can see a significant improvement on the speed, but can I get rid of the wall of text? Should I worry about it?

Nicola_Rubini · September 12, 2020, 9:16am

I ran a couple of tries and it came to my attention that the RooFit Batch mode indeed sped up the process, but completely messed up the fit itself. Shapes are, dare I say, randomly generated rather than fitted to data.
Any idea as to why this happens?

StephanH · September 12, 2020, 9:48am

Yes, the BatchMode is indeed not yet implemented for RooDataHist. There is a fall-back function that tries to get around this issue by running RooFit’s old-style single-value computations, and then it passes the data on in batches (that’s why you get a speed up). However, it had a bug that is fixed only in ROOT 6.22.02 and 6.24. It might be worth checking your ROOT version.

I am not sure, however, that even having this fall-back function will work in conjunction with RooDataHist. This is a work in progress, and hasn’t been tested yet as the message indicates …
If you want to post your model (or send it by email) as an example of a “typical” workflow, you might actually help developing this feature for the data hist. (I mean a runnable example. I saw part of the model, but there are not dataHists for example.)
Even if it is only for having a look into whether my assumption is true that the integrals are discarded all the time.

system · September 26, 2020, 9:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.