Slow performance in limited range fit with RooFit

elusian · December 20, 2022, 12:44pm

Hello
I’m trying to run a complex fit using RooFit, but I’m running into severe performance problems.

I have two samples using the same fit model structure, with the only difference being that in one of the samples the range of a variable (x in the example below) is reduced (due to acceptance there are no events there).
The core of the model is essentially (in pseudocode):

pdf_sig = RooProdPdf([pdf_sx(sx), pdf_m(m), pdf_physics(x, a | sx, m), <other>])
pdf_bkg = RooProdPdf([<various bkg pdf>])
fit_model = RooAddPdf([pdf_sig, pdf_bkg], [N_sig, N_bkg])

The fit completes fine in the full range sample, but when ran on the limited range sample it seems to be extremely slow.
In the logs I see mentions of numerical integrals:

[#1] INFO:NumericIntegration -- RooRealIntegral::init(SUBPROD_pdf_sx_NORM[sx]_X_pdf_physics_Int[a,x|NormalizationRangeFor<subrange>]_Norm[a,x]_X_pdf_m_NORM[m]_Int[sx,m|NormalizationRangeFor<subrange>]) using numeric integrator RooAdaptiveIntegratorND to calculate Int(sx,m)

which may be the problem, since they probably depend on x and a and thus change at each event (and pdf_physics is very much not meant to be integrated in sx and m, hence the numerical integral).
What are those exactly? I think they come from RooAbsOptTestStatistics and are related to the RooAddPdf coefficients but I’m not sure about the details.

Currently I’m fitting each sample on its own, but eventually they are supposed to go into a RooSimultaneous, using SplitRange.

Can anything be done to make the fit faster, possibly avoiding those integrals?

jonas · December 20, 2022, 1:25pm

Hi @elusian,

interesting problem as always!

When you do a sub-range fit in RooFit, the coefficients of the RooAddPdf are still defined as the coefficients that one would need if the component PDFs are normalized over the full range. But in subrange fits, all PDFs are normalized to the subrange. Hence, the RooAddPdf creates some internal scale factors of integrals to correct the coefficients.

What you can do is to use SumCoefRange() command argument to RooAbsPdf::fitTo(). Just set it to the same subrange as you use for the fit. Now, the coefficients will be interpreted as the coefficients you need to multiply with the PDF normalized in the subrange, so no correction integrals needed. RooFit will notice this and not create the integral (I hope).

Note that this would mean that your N_sig and N_bkg parameters are now corresponding to the number of events in the subrange, and not the total number of events. But you can can get back the total number of events by multiplying with the right integrals. I can tell you how this can be done, in case you need to do it! But maybe that’s not necessary, depending on how you use the yield parameters.

Cheers,
Jonas

PS: In the next ROOT release 6.28 I try to get the RooAddPdf a bit smarter such that it avoids unnecessary integrals. So maybe the problem will be gone by then.

elusian · December 20, 2022, 1:39pm

Having N_bkg and N_sig being the number of events in the subrange is fine, because, at least in the reduced range sample, there is literally nothing outside (do I actually need a subrange if there are no events outside the range?).
However, I do not think that works with RooSimultaneous and SplitRange, since RooFit would try to compute the coefficients in the fit of the full range sample, which uses the full range for the fit.

jonas · December 20, 2022, 2:07pm

Hmm right, good point. Do you fit in one subrange only, or do you have a comma-separated list of subranges?

If it’s only a single subrange, you could manually set the right coefficient normalization range with RooAbsReal::fixAddCoefRange(). If you have multiple comma-separated ranges, you can do this only in ROOT master because multi-range normalization didn’t work before. Would that work for you?

And what do you think, maybe the SplitRange() argument should also apply to the range passed to the SumCoefRange() argument? Then you can set SumCoefRange() to the same as the fit range, also in simultaneous fits with SplitRange(). Right not, RooFit doesn’t consider SplitRange for SumCoefRange, which should be fixed I think.

elusian · December 20, 2022, 3:38pm

It’s a single range, so RooAbsReal::fixAddCoefRange could probably work. Should I call it on the top level fit_model relative to the limited range sample?

I think there should be a way to have a split SumCoefRange, but since the current behaviour is valid too, maybe it could be a bool parameter in SplitRange, which defaults to the false to keep the current behaviour, but if set to true also splits the SumCoefRange.

jonas · December 20, 2022, 4:22pm

I think you need to do it for each channel PDF, so not the top level RooSimultaneous but each PDF in it, because you have a different range name for each of them right? Basically you need to do the SplitRange() manually. Sorry if this is what you meant already.

elusian · December 20, 2022, 4:30pm

Ah, yes, that is what I meant, applying it only on the channels that have a limited range, apologies for not being clear.

By the SplitRange manually you mean only the new hypothetical SplitRange + SumCoefRange, right? I also need the normal SplitRange for setting the actual fit range. So in the end I think I will have

pdf_chan1 = RooAddPdf(...)
pdf_chan2 = RooAddPdf(...)
pdf_chan2.fixAddCoefRange('fitRange_chan2')

sim = RooSimultaneous({'chan1': pdf_chan1, 'chan2': pdf_chan2}, channel_cat)
sim.fitTo(data, SplitRange = True, Range = 'fitRange', ...)

correct?

jonas · December 20, 2022, 4:36pm

Yes correct, that looks good!

elusian · December 20, 2022, 4:38pm

Thank you! I’ll try and I’ll let you know.
Cheers,
Enrico

elusian · January 2, 2023, 3:37pm

Hello, sorry for the late reply but the fit takes a few days to complete (with the fix, otherwise it would still be running).
The fix seems to be working, I did not get any weird integral during minimization.
I did get some unexpected integrals while computing asymptotic errors ( pdf_sx_X_ratio(pdf_physics_Int[a,x],pdf_physics_Int[a,x|<subrange>])]_Norm[sx]_denominator_Int[sx] ), but they do not seem to slow the process down (which is good, because asymptotic errors computation takes > 2x the time of the fit…).
Thank you for the help!
Cheers,
Enrico

system · January 16, 2023, 3:38pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.