Home | News | Documentation | Download

Plot discriminating variable in sPlot

I am using sPlot to separate signal and background in my analysis. I currently use an invariant mass distribution as my discriminating variable. However, I would like to plot the signal only portion of the invariant mass. When I ask sPlot to give this, I get strange plots (e.g. negative values in the tails). I am under the impression that I am seeing the sWeights. If this is the case, is there a way to visualize the discriminating variable after sPlot has been run.

Hi @byrates,

What are you doing exactly after you ran the splot? I guess you are right that you see the effect of the weights, but maybe what you really see is the original event data multiplied with the weights.
I guess we can say more when you give a bit more details.

I grab the frame, then plot the dataset on it

RooPlot* frame = d0_mass.frame() ;
sigData.plotOn(frame);

Where sigData is weighted by the sPlot version of nsig which is nsig_sw from the invariant mass fit.

And the sigData are the data coming from GetSDataSet()?

Did you filter the data somehow?

I also need to understand a bit more what you need to have in the plot. If you only want to see the signal distribution, you have to plot the whole (signal + background) dataset because the weights are adjusted such that the signal distribution remains. The weights kind of do the background subtraction.
If you only want to see the signal, but with weights applied (this distribution will be distorted because you miss the background events), you have to filter the dataset based on some kind of category tag or similar.
If you want the signal distribution without weights, you probably need a dataset that ignores weights, but in this case you wouldn’t need the sPlot at all.

Did you find this tutorial? Maybe that’s also helpful.
https://root.cern.ch/doc/master/rs301__splot_8C.html

I have not filtered the data. I was asked to fit the signal only and make sure the parameters have not changed. This is probably a redundant steep since by definition the signal remains.
I did read the posted tutorial, and my code is based on it.

Yes, I guess it’s redundant because you can only fit the signal in the sense that the generated distribution in the unfolding variable has to resemble the signal shape.
Does it? If it does not, the unfolding failed.

Currently, plotting the sPlot signal for the mass distribution does not even have the correct number of events. It has significantly more events than in the original fit.

Did you create a weighted dataset as in the tutorial? Could it be that the weights of the signal distribution are higher (as they probably need to be to make the s plot work), but the number of events is actually correct?

Here is a breakdown. I have a dataset called ds, and a fit model model for the invariant mass d0_mass with nsig and nbkg being the event yields.
I then run:

SPlot sData("sData","An SPlot from mass", ds, &model, RooArgList(nsig,nbkg));
RooDataSet sigData = RooDataSet(ds.GetName(), ds.GetTitle(), &ds, *ds.get(), "", "nsig_sw");
RooPlot* frame = d0_mass.frame() ;
sigData.plotOn(frame);

You are probably right that I am seeing the large weights. I was asked to do this as a closure test though, so I’m hoping I can find a way to plot the background subtracted mass.

Ok, this looks correct. You are adding the s weights to the dataset, and then you are creating the dataset that uses the s weights to project out the signal distribution. Does it look wrong?

Here is an example of what I am fitting
1
and what I see as when I ask for the sWeighted data 2
The mass fit has 3558.59 signal events, but the sWeighted plot has 16185.6 events.

Could you try to get the yields using GetYieldFromSWeight()

GetYieldFromSWeight("nsig") does give the correct 3559.58 events for the sum of weights.

If that’s the case, I don’t see why the plot should have the wrong number of events. How did you count these?
Could it be that the bin contents in the plot are divided by the bin width, i.e. it’s plotting an event density? In that case, you will think you see many more evens than are actually plotted since the bin width is much smaller than 1.

The peculiar thing is if I use the entire mass fit range, showing where the background was, I get negative values, but the integral of the plot does come out correct. Could it be that sPlot is depending on the negative side-bands (basically zero signal minus large background) to normalize the plot?

Is this a plot of the complete dataset, i.e., signal+background?

That last plot with the negative side-bands is plotting the sWeighted dataset (ds from earlier) on the frame from the invariant mass. This is normally how I retrieve signal only plots for everything besides the discriminating variable.

Is this really ds from here?

Shouldn’t it be sigData? If it’s just ds, you are not using the correct weights. I actually don’t know which weights are being applied. It’s the ones that ds had when it was constructed.
Instead, to see the signal shape, you should be applying the weights "nsig_sw", which you only get when plotting sigData.

To be safe, could you check that the number of entries in sigData and sData are equal? I’m wondering if an empty selection cut "" could do strange things. To not apply a cut, pass a nullptr.

My mistake, I should have typed sigData in my last reply instead of ds. I have checked the events in the past, and here is a printout:

RooDataSet::ds[d0_mass,ptfrac,d0_l_mass,weight,d0_pt,epoch,j_pt_ch,j_pt,d0_mass2,nsig_sw,L_nsig,nbkg_sw,L_nbkg,weight:tuneW] = 93836 entries
RooDataSet::sigData[d0_mass,ptfrac,d0_l_mass,weight,d0_pt,epoch,j_pt_ch,j_pt,d0_mass2,L_nsig,nbkg_sw,L_nbkg,weight:nsig_sw] = 93836 entries (3594.32 weighted)

The number of entries are identical, and the sigData version has the weighted events.

Ok, that looks correct.

So in the end the question is why the negative tails. Well, these are needed:

  • The s plot is meant to “subtract” the background from signal by assigning negative weights to the background.
  • This can be used on an independent distribution (i.e. not the mass) to subtract the background.
  • This, however, means that in the mass distributions, the background regions need to be negative. Otherwise, the subtraction doesn’t work.
  • In the signal region of the mass distribution, you should therefore see more than the bare number of signal events, because only the sum of the signal and the negative background should yield the initial number of signal events.

From all I can see, the method works correctly.