Chi2 in the case of weighted data non integers values sum2w

abaoud · February 19, 2025, 4:25pm

Hello experts, I’m fitting a weighted MC dataset with non-integer values using the chi2FitTo() method and a minimization algorithm to optimize chi2/ndof . In my case, Fval is equivalent to chi2/ndof.
Since I’m working with weighted data, I’m using SumW2 to correctly compute the uncertainty (sigma) for chi2. Unlike standard bins where sigma= root square of y data, in this case, sigma is computed as the sum of the square root of weights in each bin: I obtain correct results using:

chi2_var = ROOT.RooChi2Var("chi2", "chi2", model,data,False,ROOT.RooDataHist.SumW2)
chi2_value = chi2_var.getVal()

#chi2 = frame.chiSquare("model", "data", npar)
ndof = dataHist.numEntries() - npar
chi2= chi2_value/ndof

However, this does not work when using: frame.chiSquare(npar)
Does frame.chiSquare(npar) handle weighted data differently? Any suggestions on how to correctly use it in this case?

mdessole · February 19, 2025, 4:33pm

Hi @abaoud,
welcome to the ROOT forum!

Could you please add more context to your code snippet by explaining 1. if frame.chiSquare(npar) raises an error (and which one if so) and 2. how did you define the frame object?

Cheers,
Monica

abaoud · February 19, 2025, 4:54pm

Hi @mdessole, frame.chiSquare did not raise an error, but it gave me an unexpectedly high value of chi²/ndof. I suspect that it does not compute the uncertainty as sqrt(sum of squares of weights), which may explain the odd result. About the frame object aims to creat a rooplot object, where i can plot my data as well as my model.

    frame = var.frame()
    ROOT.SetOwnership(frame, True)
    data.plotOn(frame, ROOT.RooFit.Name("data"))
    model.plotOn(frame, ROOT.RooFit.Name("model"), ROOT.RooFit.LineColor(ROOT.kRed))
    chi2_var = ROOT.RooChi2Var("chi2", "chi2", model, data, False, ROOT.RooDataHist.SumW2)
    chi2_value = chi2_var.getVal()
    #chi2 = frame.chiSquare("model", "data", npar)
    ndof = dataHist.numEntries() - npar
    chi2= chi2_value/ndof
    col = 0
    cols = [ROOT.kGreen+2, ROOT.kBlue, ROOT.kOrange+1, ROOT.kCyan]
    frame.Draw()
    frame.GetYaxis().SetRangeUser(0, frame.GetMaximum()*1.5)
    leg = ROOT.TLegend(0.7, 0.63, 0.91, 0.85)
    ROOT.SetOwnership(leg, True)
    leg.SetBorderSize(0)
    leg.SetFillStyle(0)
    leg.AddEntry("data", "Data", "LP")
    leg.AddEntry("model", "model", "L")
    for name, comp in components.items():
        leg.AddEntry(name, name, "L")
    leg.Draw()
    ltx = ROOT.TLatex()
    ROOT.SetOwnership(ltx, True)
    ltx.SetNDC()
    ltx.SetTextColor(ROOT.kBlack)
    pos_y = 0.85
    ltx.DrawLatex(0.2, pos_y, "#chi^{{2}}/dof={chi2:.3f}".format(chi2=chi2))

best, Abdessamad

mdessole · February 20, 2025, 9:25am

Let me add in the loop our RooFit experts @jonas @StephanH

jonas · February 24, 2025, 2:21pm

Hi @abaoud, thanks for the question!

Which ROOT version are you using? With ROOT 6.34.04, the chi2 test statistic and the chi2 value from the plot both give the right value with the sum of weight squares for the variance.

See this little example:

RooRealVar x{"x", "x", 0., 10.};
x.setBins(2);

RooDataHist dataHist{"data", "data", x};
x.setBin(0);
dataHist.add(x, 2.0);
x.setBin(1);
dataHist.add(x, 4.0);

RooRealVar c{"c", "c", 2 * 3./10, 0., 10.};
RooGenericPdf func{"func", "c + x - x", {c, x}};

std::unique_ptr<RooAbsReal> chi2{func.createChi2(dataHist, RooFit::DataError(RooAbsData::SumW2))};

auto frame = x.frame();
dataHist.plotOn(frame, RooFit::DataError(RooAbsData::SumW2));
func.plotOn(frame);
double chiSquare = frame->chiSquare(1);

// analytical value is 0.5**2 + 0.25**2 = 0.3125
std::cout << chi2->getVal() << std::endl;
std::cout << chiSquare << std::endl;

The output is:

0.3125
0.3125

Maybe you there are some inconsistencies with normalizations or bin widths in your case? What is your model, is it a RooAbsPdf (extended or not?), or just a regular RooAbsReal function? What does your function represent? A probability density, or yields directly? What is your bin width?

It would be good to have more info on the ROOT version and details of the model.

Cheers,
Jonas

abaoud · February 24, 2025, 3:14pm

Hello @jonas and @mdessole, Thanks for the answering.

Root Version:

The ROOT version used is 6.32.02.

Normalization and Bin Width:
I’ve checked both the normalization and bin width, and there don’t appear to be any issues with them. You can review the relevant code snippet below for more details.

Model Details:
The model is a non-extended RooAbsPdf aimed at fitting a histogram with weighted data. Currently, I’m focusing on the shape of the distribution rather than yields, which is why I’m using a probability density function.

The bin width is set to 0.1 ps, but I adjust it iteratively to achieve better fitting results.

import ROOT
from ROOT import RooRealVar, RooArgList, RooFormulaVar, RooAddPdf, RooFit
fileP = ROOT.TFile.Open("tbar/histograms.root")
folderP = fileP.Get("/NOSYS/reco") 
histograms = folderP.Get("lifetime")
 #
var = ROOT.RooRealVar("tau", "Lifetime ", -0.9, 0.9, "ps")
def pdf_to_th(name, model, variable, binning):
    """Converts RooFit pdf into a histogram
    Main purpose of this function is to correctly normalize
    by bin width, RooFit divides by bin width.
    """
    root_hist = model.createHistogram(name, variable, Extended=False, Scaling=False, Binning=binning)
    root_hist.SetDirectory(0)
    ROOT.SetOwnership(root_hist, True)
    # For some reason the histogram is off by bin width.
    # In theory that is what `Scaling` option was for,
    # but it is disable when calling `Extended` (which
    # is responsible for norm to expected events)
    for i in range(root_hist.GetNbinsX()):
        val = root_hist.GetBinContent(i) * total_events * bin_width 
        root_hist.SetBinContent(i, val)
    return root_hist
def roofit_draw(plotName, var, data, model, npar, extra_labels, components = {}):
    """Plots result of RooFit fit with possibility to plot specific model components.

    Arguments:
        var (``RooRealVar``): Variable used in the fit.
        data (``RooDataHist``): Hist to which model wasfitted to.
        model (``RooAddPdf``): Model which was fitted.
        result (``RooFitResult``): Result of the fit.
        extra_labels (``list`` of `` strings``): Extra labels to be
            added to the plot
        components (``string``): Components of the model to be plotted
            in addition to the complete model
    """

    ROOT.gROOT.SetStyle('ATLAS')
    ROOT.gROOT.SetBatch(True)
    c = ROOT.TCanvas(plotName)
    ROOT.SetOwnership(c, True)
    c.SetCanvasSize(1000, 1000)
    c.cd()
    main_pad = ROOT.TPad('mainpad', 'mainpad', 0, 0.3, 1, 1 )
    ROOT.SetOwnership(main_pad, True)
    main_pad.SetBottomMargin(0)
    main_pad.Draw()
    main_pad.cd()
    frame = var.frame()
    ROOT.SetOwnership(frame, True)
    data.plotOn(frame, ROOT.RooFit.Name("data"))
    model.plotOn(frame, ROOT.RooFit.Name("model"), ROOT.RooFit.LineColor(ROOT.kRed))
    chi2_var = ROOT.RooChi2Var("chi2", "chi2", model, data, False, ROOT.RooDataHist.SumW2)
    chi2_value = chi2_var.getVal()
    ndof = dataHist.numEntries() - npar
    chi2= chi2_value/ndof
    col = 0
    cols = [ROOT.kGreen+2, ROOT.kBlue, ROOT.kOrange+1, ROOT.kCyan]
   # for name, comp in components.items():
       # model.plotOn(frame, ROOT.RooFit.Name(name), ROOT.RooFit.Components(comp),
                    # ROOT.RooFit.LineColor(cols[col]),
                    # ROOT.RooFit.LineStyle(ROOT.kDotted))
        #col+=1
    frame.Draw()

    frame.GetYaxis().SetRangeUser(0, frame.GetMaximum()*1.5)

    leg = ROOT.TLegend(0.7, 0.63, 0.91, 0.85)
    ROOT.SetOwnership(leg, True)
    leg.SetBorderSize(0)
    leg.SetFillStyle(0)
    leg.AddEntry("data", "Data", "LP")
    leg.AddEntry("model", "model", "L")
    for name, comp in components.items():
        leg.AddEntry(name, name, "L")
    leg.Draw()

    ltx = ROOT.TLatex()
    ROOT.SetOwnership(ltx, True)
    ltx.SetNDC()
    ltx.SetTextColor(ROOT.kBlack)
    pos_y = 0.85
    ltx.DrawLatex(0.2, pos_y, "#chi^{{2}}/dof={chi2:.3f}".format(chi2=chi2))
    for lbl in extra_labels:
        pos_y-=0.05
        ltx.DrawLatex(0.2, pos_y, lbl)

    c.cd()
    ratio_pad = ROOT.TPad('ratiopad', 'ratiopad', 0, 0, 1, 0.3)
    ROOT.SetOwnership(ratio_pad, True)
    ratio_pad.SetBottomMargin(0.35)
    ratio_pad.SetTopMargin(0)
    ratio_pad.Draw()
    ratio_pad.cd()
    ratio = data.createHistogram('data_hist', var)
    ROOT.SetOwnership(ratio, True)
    ratio.SetDirectory(0)
    tmp = pdf_to_th('mc_hist', model, var, data.getBinnings()[0])


    for i in range(ratio.GetNbinsX()):
        val = ratio.GetBinContent(i)
        err = ratio.GetBinError(i)
        valmc = tmp.GetBinContent(i)
        if valmc != 0:
            ratio.SetBinContent(i, val/valmc)
            ratio.SetBinError(i, err/valmc)
        else:
            ratio.SetBinContent(i, 0)
            ratio.SetBinError(i, 0)
    tmp.GetYaxis().SetRangeUser(-1, 2)
    tmp.GetYaxis().SetTitle("data/model")
    tmp.GetYaxis().SetTitleOffset(0.56)
    tmp.GetXaxis().SetTitleSize(0.12)
    tmp.GetXaxis().SetLabelSize(0.12)
    tmp.GetYaxis().SetTitleSize(0.12)
    tmp.GetYaxis().SetLabelSize(0.12)
    tmp.SetNdivisions(505, 'Y')
    tmp.Divide(tmp)
    tmp.SetLineColor(ROOT.kRed)
    tmp.Draw("hist")
    ratio.Draw("same")
    if "/" in plotName:
        dirName = plotName[:plotName.rindex("/")]
        if not os.path.exists(dirName):
            os.makedirs(dirName)

    # Supress info messages about creating plots
    ignoreLevel = ROOT.gErrorIgnoreLevel
    ROOT.gErrorIgnoreLevel = ROOT.kWarning
    c.SaveAs(plotName+".png")
    ROOT.gErrorIgnoreLevel = ignoreLevel
histograms.Rebin(2)
dataHist = ROOT.RooDataHist("dataHist", "Data Histogram", ROOT.RooArgList(var), histograms)
frac1   = ROOT.RooRealVar("frac1", "fraction", 0.675, 0.55, 0.72)  
frac2   = ROOT.RooRealVar("frac2", "fraction", 0.27, 0.22, 0.35)  
mean1   = ROOT.RooRealVar("mean1", "mean1", -0.0019, -0.01, 0.01)  
mean2   = ROOT.RooRealVar("mean2", "mean2", 0.0155, 0.008, 0.027)  
mean3   = ROOT.RooRealVar("mean3", "mean3", -0.0101, -0.02, -0.0045)  
sigma1  = ROOT.RooRealVar("sigma1", "sigma1", 0.07, 0.048, 0.088)  
sigma2  = ROOT.RooRealVar("sigma2", "sigma2", 0.69, 0.5, 0.88)  
sigma3  = ROOT.RooRealVar("sigma3", "sigma3", 0.205, 0.14, 0.29) 
gauss1 = ROOT.RooGaussian("gauss1", "Gaussian 1", var, mean1, sigma1)
gauss2 = ROOT.RooGaussian("gauss2", "Gaussian 2", var, mean2, sigma2)
gauss3 = ROOT.RooGaussian("gauss3", "Gaussian 3", var, mean3, sigma3)
model = ROOT.RooAddPdf("model", "Triple Gaussian Model", ROOT.RooArgList(gauss1, gauss2, gauss3), ROOT.RooArgList(frac1, frac2), True)
total_events = dataHist.sumEntries()
bin_width = 0.1
result = model.chi2FitTo(dataHist, ROOT.RooFit.Save(), ROOT.RooFit.SumW2Error(True))
ratio=roofit_draw( "tripleGaussian" ,var, dataHist, model, 8, ["MC prompt", "Triple Gaussian"], { "Gaussian1" : "gauss1", "Gaussian2": "gauss2", "Gaussian3":"gauss3" } )
fileP.Close()

best regards,
Abdessamad baoud

jonas · February 24, 2025, 3:35pm

Thanks!

What does “unexpectedly high value” mean for you? How much higher than expected?

When I take your script and fit a toy dataset generated from the model with ROOT 6.34.04, the values are almost the same:

    # Using RooChi2Var directly is not possible in ROOT 6.34 anymore
    chi2_var = model.createChi2(data, DataError="SumW2")
    chi2_value = chi2_var.getVal()
    ndof = dataHist.numEntries() - npar
    chi2= chi2_value/ndof
    print("chi2_value/ndof ", chi2)
    print("chiSquare ", frame.chiSquare(npar))

Output:

chi2_value/ndof  0.908039099858847
chiSquare  0.9125583952688261

The difference is a binning effect. You are fitting a continuous pdf to binned data, and the chi-square test statistic compares the data with the pdf value at the bin center. The plot chiSquare() compares the data with with the average pdf value in a bin, which is not the same in general. Therefore, for a finite number of bins, it is expected that the RooPlot::chiSquare() is slightly higher, because the test statistic that was optimized is defined slightly differently.

I will also test with ROOT 6.32.02 like you did, but I’m already writing this insight here in case that already explains your difference!

PS: I generated the dataset with:

dataHist = model.generateBinned(var, 10000)

jonas · February 24, 2025, 3:38pm

Same results with ROOT 6.32.02 for the toy study.

abaoud · February 27, 2025, 10:19am

Hello @jonas again,
Sorry for the late reply, when i tried both methods the differences between them were very noticeable, like in my last test, where I got 6.231 and 26.787.
So, I repeated the same process using a dataset with the same characteristics as the first one (weighted and binned data) and applied the same method, "chi2fitTo.
This time, it worked for me, and the differences between them were very small, almost unnoticeable.
Therefore, this result has left me very confused.
best regards,
Abdessamad baoud

system · March 13, 2025, 10:19am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.