Subtracting a function from a dataset


I have a RooFit dataset, built from a data histogram, which consists in a combinatorial background plus a signal peak. The operation I want to do is the following:

  1. fit this dataset outside the peak with any reasonable function that reproduces the background well enough (e.g.: a polynomial)
  2. subtract the background estimated this way from the dataset (or histogram), and, if possible, correctly propagating the errors;
  3. computing the integral of the subtraction obtained this way in order to estimate the number of counts under the peak.

I need to do like this since the statistics is low and estimating this number from a fit is not so safe. I have searched for some operations on RooDataHist objects similar to the operations which are implemented on TH1D but I didn’t find examples of this operation. Is this feasible?

Thanks, cheers


You can do what you want (steps 1 through 3) by performing a fit
of a model to the data in your side bands only.

Given an example dataset with x[-10,10] and an signal
area in the center [-4,4] you do the following

// Define sideband ranges

// Assume a pdf ‘bkg’ is defined that describes your background
// in the sideband regions

// Now define an extended pdf from the background pdf
RooRealVar nbkg(“nbkg”,“nbkg”,0,10000) ;
RooExtendPdf ebkg(“ebkg”,“ebkg”,bkg,nbkg,“FULL”) ;

// Now perform an extended ML fit in ranges A,B

The parameters of ebkg that are fitted are any parameters of your pdf ‘bkg’ plus the yield parameter nbkg, which will represented the yield in the entire domain of x, including the signal region (This happens because the definition of nbkg is hard-wired to the “FULL” range in the RooExtendPdf ctor.). Since the fit immediately returns the number you are interested, no further error propagation is needed. You can just use the fit error.

Then you subtract this number from the event data count to obtain an estimate of the signal event count.



this question comes just since I am still not very experienced of RooFit :slight_smile:
I have done what you suggest, and then I have now a fitted ‘ebkg’ function.

Now, I have two points:

  1. in order to have a stable fit, I usually do a rebin of the histogram (TH1) I want to fit, before passing it to a RooDataHist, but once the fit is done, I would prefer to make this count on the un-rebinned histogram.

  2. if I then have a RooDataHist dataset built from a histogram and a ‘ebkg’ function with parameters adjusted by a fit, how can I technically subtract the function from the points in the dataset and then sum them up (correctly propagating the errors) in the sub-range where I expect to have the peak?



Re 1), I’m not sure what the question is: the total event count in histogram
is the same before and after rebinning, so a rebinning exercise should be inconsequential for the interpretation of the numbers

Re 2), You don’t have to do a bin-by-bin summation to arrive at the correct answer
(in fact it is possible to do this entire exercise unbinned) as wel. The number
you get from your fit, say Ntot +/- Etot represent the estimated total number of background events (with error) estimated in your dataset. If your total event count is Ndata,
then your signal count is simply (Ndata - Ntot) +/- Etot (since the observed number of events has no error).


I Understand this but if I want to sum the counts only in the subset which is inside the ‘signal’ range, excluding whatever is outside, probably I should do a bin-by-bin count.
In fact, it can happen that in some points outside the peak, the function by chance gives a number smaller than the bin content, but I should not take into account that bin, if it is outside the peak.


In that case it is easiest to modify your RooExtendPdf to express the yield
not in the “FULL” range but rather in your signal range e.g.

x.setRange(“SIG”,-4,4) ;

RooExtendPdf ebkg(“ebkg”,“ebkg”,bkg,nbkg,“SIG”) ;

In that way the fit results in the number of expected background events in
the signal range and propagates the errors accordingly.
Then you can subtract this number from the number of data events you count
in the signal range