I am using sPlot to separate signal and background in my analysis. I currently use an invariant mass distribution as my discriminating variable. However, I would like to plot the signal only portion of the invariant mass. When I ask sPlot to give this, I get strange plots (e.g. negative values in the tails). I am under the impression that I am seeing the sWeights. If this is the case, is there a way to visualize the discriminating variable after sPlot has been run.

# Plot discriminating variable in sPlot

Hi @byrates,

What are you doing exactly after you ran the splot? I guess you are right that you see the effect of the weights, but maybe what you really see is the original event data multiplied with the weights.

I guess we can say more when you give a bit more details.

I grab the frame, then plot the dataset on it

```
RooPlot* frame = d0_mass.frame() ;
sigData.plotOn(frame);
```

Where `sigData`

is weighted by the sPlot version of `nsig`

which is `nsig_sw`

from the invariant mass fit.

And the `sigData`

are the data coming from GetSDataSet()?

Did you filter the data somehow?

I also need to understand a bit more what you need to have in the plot. If you only want to see the signal distribution, you have to plot the whole (signal + background) dataset because the weights are adjusted such that the signal distribution remains. The weights kind of do the background subtraction.

If you only want to see the signal, but with weights applied (this distribution will be distorted because you miss the background events), you have to filter the dataset based on some kind of category tag or similar.

If you want the signal distribution without weights, you probably need a dataset that ignores weights, but in this case you wouldn’t need the sPlot at all.

Did you find this tutorial? Maybe that’s also helpful.

https://root.cern.ch/doc/master/rs301__splot_8C.html

I have not filtered the data. I was asked to fit the signal only and make sure the parameters have not changed. This is probably a redundant steep since by definition the signal remains.

I did read the posted tutorial, and my code is based on it.

Yes, I guess it’s redundant because you can only fit the signal in the sense that the generated distribution in the unfolding variable has to resemble the signal shape.

Does it? If it does not, the unfolding failed.

Currently, plotting the sPlot signal for the mass distribution does not even have the correct number of events. It has significantly more events than in the original fit.

Did you create a weighted dataset as in the tutorial? Could it be that the weights of the signal distribution are higher (as they probably need to be to make the s plot work), but the number of events is actually correct?

Here is a breakdown. I have a dataset called `ds`

, and a fit model `model`

for the invariant mass `d0_mass`

with `nsig`

and `nbkg`

being the event yields.

I then run:

```
SPlot sData("sData","An SPlot from mass", ds, &model, RooArgList(nsig,nbkg));
RooDataSet sigData = RooDataSet(ds.GetName(), ds.GetTitle(), &ds, *ds.get(), "", "nsig_sw");
RooPlot* frame = d0_mass.frame() ;
sigData.plotOn(frame);
```

You are probably right that I am seeing the large weights. I was asked to do this as a closure test though, so I’m hoping I can find a way to plot the background subtracted mass.

Ok, this looks correct. You are adding the s weights to the dataset, and then you are creating the dataset that uses the s weights to project out the signal distribution. Does it look wrong?

Here is an example of what I am fitting

and what I see as when I ask for the sWeighted data

The mass fit has 3558.59 signal events, but the sWeighted plot has 16185.6 events.

`GetYieldFromSWeight("nsig")`

does give the correct 3559.58 events for the sum of weights.

If that’s the case, I don’t see why the plot should have the wrong number of events. How did you count these?

Could it be that the bin contents in the plot are divided by the bin width, *i.e.* it’s plotting an event *density*? In that case, you will think you see many more evens than are actually plotted since the bin width is much smaller than 1.

The peculiar thing is if I use the entire mass fit range, showing where the background was, I get negative values, but the integral of the plot *does* come out correct. Could it be that sPlot is depending on the negative side-bands (basically zero signal minus large background) to normalize the plot?

Is this a plot of the complete dataset, *i.e.*, signal+background?

That last plot with the negative side-bands is plotting the sWeighted dataset (`ds`

from earlier) on the frame from the invariant mass. This is normally how I retrieve signal only plots for everything besides the discriminating variable.

Is this really `ds`

from here?

Shouldn’t it be `sigData`

? If it’s just `ds`

, you are not using the correct weights. I actually don’t know which weights are being applied. It’s the ones that `ds`

had when it was constructed.

Instead, to see the signal shape, you should be applying the weights `"nsig_sw"`

, which you only get when plotting `sigData`

.

To be safe, could you check that the number of entries in `sigData`

and `sData`

are equal? I’m wondering if an empty selection cut `""`

could do strange things. To not apply a cut, pass a `nullptr`

.

My mistake, I should have typed `sigData`

in my last reply instead of `ds`

. I have checked the events in the past, and here is a printout:

```
RooDataSet::ds[d0_mass,ptfrac,d0_l_mass,weight,d0_pt,epoch,j_pt_ch,j_pt,d0_mass2,nsig_sw,L_nsig,nbkg_sw,L_nbkg,weight:tuneW] = 93836 entries
RooDataSet::sigData[d0_mass,ptfrac,d0_l_mass,weight,d0_pt,epoch,j_pt_ch,j_pt,d0_mass2,L_nsig,nbkg_sw,L_nbkg,weight:nsig_sw] = 93836 entries (3594.32 weighted)
```

The number of entries are identical, and the sigData version has the weighted events.

Ok, that looks correct.

So in the end the question is why the negative tails. Well, these are needed:

- The s plot is meant to “subtract” the background from signal by assigning negative weights to the background.
- This can be used on an
*independent*distribution (*i.e.*not the mass) to subtract the background. - This, however, means that in the mass distributions, the background regions
*need*to be negative. Otherwise, the subtraction doesn’t work. - In the signal region of the mass distribution, you should therefore see
*more*than the bare number of signal events, because only the sum of the signal and the negative background should yield the initial number of signal events.

From all I can see, the method works correctly.