Behavior of binned likelihood fit with custom bin errors

AlkaidCheng · January 9, 2025, 3:05am

Dear RooFit experts,

Regarding this post, what will be the behavior if I use ‘AsymptoticError’ instead of ‘SumW2Error’ ?

The specific scenario I am currently facing is background estimation from events after neural network selection. To account for uncertainties in the neural network, I do deep ensembling, i.e. train an ensemble of N neural networks. Then the resulting histogram will be the mean of the N ensembled histograms (for each bin). The error of each bin is probably not Poisson (or is it?), so we have to put custom errors there.

Another question is, is there a difference between fitting on a RooDataSet with one event for each bin with weight equal to the bin count, and fitting on RooDataHist?

Thanks a lot for your time!

Regards,
Alkaid

Danilo · January 9, 2025, 7:31am

Dear Alkaid,

Let me add in the loop @jonas , who could help out with this question.

Best,
D

jonas · January 9, 2025, 11:16am

Hi @AlkaidCheng,

I don’t see the immediate connection between your scenario and the corrections for weighted data (“AsymptoticError” and “SumW2Error”). Your error estimation from the ensembling is not introducing weights to the data, bus non-Possion uncertainties as you say. However, the likelihood method assumes precise observed data values, and I would advise against trying to make it work for “fuzzy” data. We generally avoid this in HEP, because it’s not clear what to do as you have realized.

If you do a neural-network based analysis, I guess you are doing a template fit where you figure out the expected shape with MC? The usual way is to encode systematic uncertainties as variations of the template histograms. Can you find a way to represent the uncertainty from the neural network with a nuisance parameter in the model? Would the usual way of interpolating up-and down-variations of the templates work, where you get the up and down variations from the ± 1sigma quantiles of the post-nn value distribution in each bin?

I think I see where you’re coming from with the weights. You are maybe trying to encode the additional uncertainty in the data by introducing custom weights to reduce the statistical power by reducing effective sample size? I would really not go that route. You saw this being done somewhere? The “AsymptoticError” and “SumW2Error” corrections for weighted data are both approximations, although the asymptotic one is better. But they can still be bad approximations, especially when you have very different relative errors in each bin. Avoid weighted data if you can. These corrections are usually used only when fitting to MC, where weights are not avoidable but the exact result also doesn’t matter as much.

Another question is, is there a difference between fitting on a RooDataSet with one event for each bin with weight equal to the bin count, and fitting on RooDataHist?

Yes, there is a difference if you are using the “SumW2Error” correction, which requires the sum of squared weights from the samples in each bin. If you fill the RooDataSet with weights corresponding to the sum of weights, this information is lost. You can add this information by adding the correct \sqrt{\sum w^2} errors as weight errors to the RooDataHist. But I don’t know if I should go more into detail there, since I advised against weighted fits before.

I hope these ideas help you with your analysis! Let me know if you have more questions.

system · January 23, 2025, 11:16am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.