Setting parameters and limits for p5+gaus function

Edward_Finkelstein · September 28, 2020, 4:57pm

I do

print fdata.GetName()

and I get

../analysis/out/sys0/data/data_2015_2016_2017_2018.root

So I upload it to Cernbox as the file:
fdata.root

Here is the modified script

from ROOT import *
import sys
sys.path.append("modules/")
import math


print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)
print 'Use as: script.py -b 0 (or 1,2)'
myinput="interactive"
xdir=""
cross=0
expected_pb=1000.
if (len(sys.argv) ==2):
   myinput = sys.argv[1]
   



print 'Mode=',myinput
gROOT.Reset()
figdir="figs/"
nameX="m_{jjl} [TeV]"
nameY="Events / TeV"
Ymin=0.9
Ymax=1000000000-100000
YRMIN=-4.999
YRMAX=4.999
Xmin=400
Xmax=10000
Xmin=Xmin*0.001
Xmax=Xmax*0.001
NN=0
gROOT.SetStyle("Plain");
c1=TCanvas("c","BPRE",10,10,600,600);
c1.Divide(1,1,0.005,0.005);
c1.SetTickx()
c1.SetTicky()
c1.SetTitle("")
c1.SetLineWidth(3)
c1.SetBottomMargin(0.1)
c1.SetTopMargin(0.05)
c1.SetRightMargin(0.01)
c1.SetFillColor(0)
pad1 = TPad("pad1","pad1",0,0.3,1,0.97)
pad1.SetBottomMargin(0)
pad1.SetLeftMargin(0.13)
pad1.SetRightMargin(0.04)
pad1.SetTopMargin(0.02)
pad1.Draw()
pad1.cd()
pad1.SetLogy(1)
pad1.SetLogx(1)


h=pad1.DrawFrame(Xmin,Ymin,Xmax,Ymax)
ay=h.GetYaxis();
ay.SetLabelFont(42)
ay.SetLabelSize(0.05)
ay.SetTitleSize(0.06)
ay.SetNdivisions(505);
ay.SetTitle(nameY)
ax=h.GetXaxis();
ax.SetTitle(nameX);
ax.SetTitleOffset(1.18)
ay.SetTitleOffset(0.8)
ax.SetLabelFont(42)
ay.SetLabelFont(42)
ax.SetLabelSize(0.12)
ax.SetTitleSize(0.14)

ax.Draw("same");
ay.Draw("same");
dir=xdir
ff=TFile("../analysis/out/sys42/data/data_2015_2016_2017_2018.root")
ff.ls()
name="jjlmass_tev"
hh=ff.Get(name)
hhorg=hh.Clone()
TFdata=[]
TFdata2015=[]
TFdata2016=[]
TFdata2017=[]
TFdata2018=[]
TFttbar=[]
TFwjet=[]
TFstop=[]
QCDsamples=[1,2,3,4,5,6,7]

MCqcd_list=[]
maindir='../analysis/out/'
for i in range (0,1):
	sdir=maindir+'sys'+str(i)+"/data/"
	TFdata.append( TFile(sdir+"data_2015_2016_2017_2018.root") )
    
    







systematics=0
fdata=TFdata[systematics]


print fdata.GetName()

bins=[]

bins.append(TFile("../analysis/fdata.root"))

fdata=bins[systematics]

bids=fdata.Get("bins_m_tev")

m1=bids.GetBinContent(1)

nCPU=int(m1/(13.0*0.001))

print "Nr of cores=",nCPU

bids.Scale(1.0/nCPU)

hh.Divide(bids)

Events=hh.GetBinContent(1)
hh.SetTitle("")
hh.SetStats(0)
hh.SetLineWidth(2)
hh.Print("All")
hh.SetAxisRange(Ymin,Ymax,"y");
hh.SetAxisRange(Xmin,Xmax,"x");
hh.SetMarkerColor( 3 )
hh.SetMarkerStyle( 20 )
hh.SetMarkerSize( 0.8 )


#print hh.Integral(hh.FindFixBin(1.0),hh.FindFixBin(2.0))
#print 5*math.sqrt(hh.Integral(hh.FindFixBin(1.0),hh.FindFixBin(2.0)))

#print hh.Integral(hh.FindFixBin(5.0),hh.FindFixBin(6.0))
#print 5*math.sqrt(hh.Integral(hh.FindFixBin(5.0),hh.FindFixBin(6.0)))

#Signal Injection
class Gauss:
    def __call__( self, x, par ):
        out=par[0] * TMath.Gaus(x[0],par[1],par[2])
        return out;


a=0.1
b=4.5
c=0.2


pycaller = Gauss()
bla=TF1("bla",pycaller,4.0,5.0,3);
bla.SetParameters(a,b,c)
bla.SetParNames("a","b","c")

for i in range(0,5000):
    TFdata.append(hh.Fill(bla.GetRandom(4.0,5.0)))

#End of Signal Injection





hh.Draw("pe")
fithr=hh.Fit("bla","SLR","")

bla.Draw("same")

bla.Integral(4.0,5.0)

ans = ((hh.GetXaxis().FindFixBin(5.0))-(hh.GetXaxis().FindFixBin(4.0)))

newans=1.0/ans

print "Number of Extracted Events = ", bla.Integral(4.0,5.0)/newans

Just use

and

and it will work. Is it ok?

moneta · September 29, 2020, 7:58am

HI,
Your cernbox links are just to index pages not to the files. Cernbox offers you a way to copy the file index.
If the files ar enot big, it is probably easier for you to attach directly in the post

Lorenzo

Edward_Finkelstein · September 29, 2020, 5:31pm

Are you sure? when I click on it, there is a Download button on the upper right-hand corner. When I click on that, it downloads the file. This is not the case for you?
I send link to other person, and other person says download button works. Files are too big to attach directly in the post.

moneta · September 30, 2020, 3:02pm

Hi,

Sorry for this, I could find now the Download button and I could get the files.

Now after running the script I understand your problem.
First of all you should divide the integral by the bin width. .The bin width (assuming all histograms bins are the same), you get as hh.GetBinWidth(1)

Then you are fitting a Gaussian peak on an histogram with some background below. It is then normal that by fitting a single Gaussian it has a larger offset due to the average background component. Like this you get an integral that is bigger.
The solution is two fit two components a signal plus a background and like this you get the correct number of signal and background events.
We have a class for doing this in ROOT, TF1NormSum and I suggest you to look at this tutorial

Best

Lorenzo

Edward_Finkelstein · October 1, 2020, 1:23am

I am pretty sure the bin widths are not all the same. Because I do something like the following (at the end of the script) and I get different values.

print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.0))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.1))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.2))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.3))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.4))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.5))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.6))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.7))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.8))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(4.9))
print hh.GetXaxis().GetBinWidth(hh.FindFixBin(5.0))

What you say about using two fits makes sense, but when I do a toy mc, I use the same procedure as I did for this signal injection, but for toy mc, I get extracted events < 5000. Maybe you know why? I attached toy mc, it can run on its own.
mc_gaus.cxx (5.1 KB)

moneta · October 1, 2020, 2:51pm

HI,
Then if you have variable bin width is more complicated. To compute the number of events you need to integrate then the function in a single bin and add all these values together at the end.
I don’t understand what you are doing with the integral value by dividing by the number of bins.

Your toy example is different. There you fit only the histogram with the Gaussian points, not the Background + signal histogram.

Lorenzo

Edward_Finkelstein · October 1, 2020, 9:57pm

If I comment out the following line from mc_gaus.cxx (5.1 KB)

pfivehisto->Fit(&parabola_5);

I still get extracted number of events < 5000. What then is the difference?

I don’t understand why integrating over each individual bin and adding all the values together would give me the number of extracted events. I am only concerned with the range 4.0-5.0, yet the bins are probably not distributed so that one starts right at 4.0 and another ends right at 5.0, so I think it would not be accurate?

I first find number of bins, then I divide the range by it to get the average bin width. I then divide the integral by the average bin width to get number of extracted events, this is wrong?

moneta · October 2, 2020, 8:27am

In mc_gaus.cxx you fit an histogram filled with only 5000 entries. In the previous code you had an histogram read from a file and then you add the 5000 gaussian entries. This is different !

I understand now you are dividing the function integral by the average bin width. This is an approximation, but the correct thing is integrating the function in each single bin between 4 and 5. If the bins are not very different probably the error introduced by using this approximation is not very large. But the correct number of events is computed as :

int firstBin = histogram->GetXaxis()->FindFixBin(4.0);
int lastBin = histogram->GetXaxis()->FindFixBin(5.0);
double nEvents = 0; 
for (int i = firstBin; i < lastBin; i++) {
    double x1 =  histogram->GetXaxis()->GetBinLowEdge(i);
    double x2 =  histogram->GetXaxis()->GetBinUpEdge(i);
    nEvents += g->Integral( x1,x2)/(x2-x1); 
}


Lorenzo

Edward_Finkelstein · October 2, 2020, 8:10pm

Ok, so I add the python2 equivalent of your code at the end of previous code that reads histogram from file.

firstBin = hh.GetXaxis().FindFixBin(4.0);
lastBin = hh.GetXaxis().FindFixBin(5.0);
nEvents = 0;
for i in range(firstBin,lastBin,1):
    x1 =  hh.GetXaxis().GetBinLowEdge(i);
    x2 =  hh.GetXaxis().GetBinUpEdge(i);
    nEvents += bla.Integral(x1,x2)/(x2-x1);


print "Number of Extracted Events = ", nEvents

And output is

Number of Extracted Events =  7511.77815009

I look at TF1NormSum example you sent me, and I have trouble figuring out how to apply it to the previous code. I thought maybe it was equivalent to just fitting the whole spectra to a single signal+background function, but that also gives a number like 7630. What would I add at the end of the code to make it work?

I also tried subtacting the background integral from the signal integral using your method and then I get
5009.000686925869 events.

moneta · October 7, 2020, 6:46am

Hi,
Attached you find code where you can use the TF1NormSum class in Python and perform a signal plus background fit to your histogram .
If you have any questions, please let me know

Lorenzo
test_fitNorm.py (4.8 KB)

Edward_Finkelstein · October 7, 2020, 2:15pm

Thank you very much Moneta. This problem is solved.