sPlot on 2 discriminating distribution or on a simultaneous fit

vberta · December 20, 2022, 1:01pm

I have a tree containing tree components: 2 backgrounds (bkg1, bkg2) and a signal (sig). The distribution of variable A is able to discriminate the bkg1 from the sum of bkg2+sig. The distribution B is able to discriminate bkg2 from sig. I would like to obtain the distribution ‘C’ of sig only.
I would like to use the sPlot to obtain the proper sig weights to do that. I have some ideas, but I am not finding a way to do it in roofit. So is it possible to do one of the following?

Give to sPlot 2 discriminating distribution as input (A and B in this case), to obtain the sweights for a third one (C)? I found this ‘legacy code’ [1] where this is described, but In the new constructor of sPlot I can provide only one pdf as input.
perform a simultaneous fit of A and B using rooSimultaneous class (which share the yield parameters of sig,bkg1 and bkg2) and use it as input for the Splot to obtain the weight for C
perform two times the sPlot (first using A as the discriminating variable and B as control variable, second using B as the discriminating variable and C as control), and apply two times in chain the resulting weights. I am worried that in this case the product of the weights is not consistent.
I already found a question about that, without a final answer: [2]

Thank you in advance for any suggestions!

[1] ROOT: TSPlot Class Reference
[2] Apply sPlot weights twice - #3 by mwilkins

jonas · December 21, 2022, 4:22pm

Hi @vberta!

In the new constructor of sPlot I can provide only one pdf as input

That is true, but you can have one single PDF for multiple dimensions! If you have your PDF for A and B, you can multiply them together with a RooProdPdf like in this tutorial to get a 2D PDFs for the distributions of each background or signal sample.

Then, use a final RooAddPdf to sum the 2D PDFs for each sample, and pass this to SPlot.

Accordingly, you also need to pass to SPlot a RooDataSet that stores both “A” and “B”.

I hope this works, let me know if you have further trouble!

Cheers,
Jonas

PS: I also have this example notebook in Python that explains you step-by-step how to do a multivariate fit with a 2D PDF.

vberta · December 21, 2022, 10:59pm

Hi @jonas , thank you for the reply!
I was thinking about a 2D fit, but my sample has quite a low statistic (10^3 events) and I have to describe the distributions with at least 50 bins each. So I am worried that a 2D fit will spoil all my statistical power. This is why I was thinking of a 2 simultaneous 1D fit, related only by the common yields (the distribution A and B are almost uncorrelated, but both has a non-trivial correlation with C)

jonas · December 21, 2022, 11:51pm

Hi! I don’t see the difference actually. If your PDFs for two variables x and y are assuming no correlation, the likelihood for the 2D PDF is exactly the same mathematically as a simultaneous NLL, which is the sum of two likelihoods:

l_\text{2D} = \sum_i{\log(p(x,y))} = \sum_i{\log(p(x)p(y))}
~~~~~=\sum_i{\left(\log p(x)+\log p(y)\right)} = \sum_i{\log(p(x))} + \sum_i{\log(p(y))} = l_x + l_y = l_{sim}

I mean you can just use the PDFs you intended to use for the simultaneous fit in a 2D fit instead, which is easier to implement. A simultaneous fit would be equivalent, but more tricky to implement in your case, because the simultaneous fit is for a different usecase. It is for the case where you measured the same or also other observable in uncorrelated datasets with generally different numbers of events. You would have to jump through a few hoops, transforming your dataset a bit to have a RooCategory, only to get an NLL that is mathematically equivalent.

So I am worried that a 2D fit will spoil all my statistical power.

I don’t think you have to worry, in fact I don’t see how you’d lose statistical power by having a 2D PDF instead of a 1D PDF in principle. How did you get this intuition? Maybe indirectly for the case where your PDFs are based on templates, because you have less events to fill the templates, making them less suitable for the fit. But also then, my intuition says that the bias might average out mathematically. Could be interesting to think a bit more about this, in case you actually have template PDFs! But even in that case, if you want to approximate your 2D PDF with factorizing 1D PDFs anyway like you intend to do in the RooSimultaneous or me in the RooProduct, you don’t need to fill 2D templates.

Could be that I’m having a blind spot here, but so far I don’t see the problem Maybe you need to explain more what you want to do, so I understand better?

Cheers,
Jonas

vberta · December 22, 2022, 1:08am

I see your point, my bad! The PDF are not template-based, so I can give a try with your suggested approach. I will try to provide quick feedback tomorrow. Thank you very much!

system · January 5, 2023, 1:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.