Roofit - multiple fit different datasets using different PDFs with some shared parameters

andrea.celentano · September 24, 2025, 12:32pm

Dear colleagues,

I want to solve the following problem using Roofit. I have N datasets for different observables, each in forms of a RooDataHist, and for each I construct a PDF from a template; the PDF is in form of a RooHistPDF. Each PDF is convoluted with a Gaussian PDF of unknown sigma, and then linearly scaled by a linear transformation x^\\prime = \\alpha x. The parameter \\alpha accounts for a possible mismatch in the absolute scale between the measured dataset and the template.

In this configuration, I thus have 2N free parameters, with each pair (\\alpha_i,\\sigma_i) independent form the other pairs. Clearly, I could execute N independent fits, but my goal, later, will be to model \\sigma with its own PDF, and extract the parameters for the latter.

I report below a snippet of my code.

#histoData is a list of histograms for DATA
#histoMC is a list of histograms for MC (create the PDF from these)
#xmin is a list of minimum x axis
#xmax is a list of maximum x axis

def doFit(histoData,histoMC,xmin,xmax,name,sigma,sigmaMin,sigmaMax,alphaIN=1):

    N=len(histoData)
    
    #create all RooRealVar
    x=[]
    for ii in range (0,N):
        x.append(ROOT.RooRealVar(name+"_x_%i"%ii,name+"_E [GeV]",xmin[ii],xmax[ii]))

    #create a global rooRealVar
    xALL=ROOT.RooArgSet(name+"_x")
    for ii in range(0,N):
        xALL.add(x[ii])
    
    # Define category to distinguish physics and control samples events
    sample = ROOT.RooCategory("sample", "sample")
    for ii in range(0,N):
        sample.defineType("sample_%i"%ii)

    #create all root data hist for DATA, prepare dictionary
    data=[]
    dictData={}
    for ii in range(0,N):
        data.append(ROOT.RooDataHist(name+("_data_%i"%ii),name+("_data_%i"%ii),x[ii],histoData[ii]))
        dictData["range_%i"%ii]=data[ii]
        
    #create the GLOBAL dataset
    dataALL=ROOT.RooDataHist(name+"_data",name+"_data",xALL,Index=sample,Import=dictData)
    
    #create the RooRealVar for MC
    E=[]
    for ii in range(0,N):
        E.append(ROOT.RooRealVar(name+"_E_%i"%ii,name+"_E_%i"%ii, 0, 500));
        E[ii].setBins(10000,"cache")
    #create all RootDataHist for MC
    dh=[]
    dh_draw=[]
    for ii in range(0,N):
        dh.append(ROOT.RooDataHist(name+("_dh_%i"%ii),name+("_dh_%i"%ii),E[ii],histoMC[ii]));
        dh_draw.append(ROOT.RooDataHist(name+("_dh_%i"%ii),name+("_dh_%i"%ii),x[ii],histoMC[ii]));

        
    #create the scale variable for MC - assume a single scale is ok for all energy ranges.
    scale=[]
    for ii in range(0,N):
        scale.append(ROOT.RooRealVar(name+"_scale_%i"%ii,name+"_scale_%i"%ii,0.95,0.5,1.5))
    p0=ROOT.RooRealVar(name+"_p0",name+"_p0",0.)

    Qf=[]
    for ii in range(0,N):
        Qf.append(ROOT.RooPolyVar(name+"_HCAL_F_%i"%ii,name+"_HCAL_F_%i"%ii,x[ii],ROOT.RooArgSet(p0,scale[ii])))

    #create all histo PDFs for MC.
    histpdf=[]
   
    for ii in range(0,N):
        histpdf.append(ROOT.RooHistPdf(name+("_histpdf_%i"%ii),name+("_histpdf_%i"%ii),E[ii],dh[ii]))
    
    
    #TODO: gaussian model
    gauss=[]
    mg=ROOT.RooRealVar(name+"_mg",name+"_mg", 0);
    #sg=ROOT.RooRealVar(name+"_sg_%i"%ii,name+"_sg_%i"%ii,sigma,sigmaMin,sigmaMax)
    #prepare gaus PDFs
    sg=[]
    for ii in range(0,N):
        sg.append(ROOT.RooRealVar(name+"_sg_%i"%ii,name+"_sg_%i"%ii,sigma,sigmaMin,sigmaMax));
        gauss.append(ROOT.RooGaussian(name+"_gauss_%i"%ii,name+"_gauss_%i"%ii,E[ii],mg,sg[ii]))
    

        
    #create all histo PDFs for MC after smearing.
    lxg=[]
    for ii in range(0,N):
        lxg.append(ROOT.RooFFTConvPdf(name+"_lxg_%i"%ii,"histpdf (X) gauss", Qf[ii], E[ii], histpdf[ii], gauss[ii]));

  
      
    #prepare the simultaneous PDF
    simPDF = ROOT.RooSimultaneous(name+"_simPDF",name+"_simPDF",sample)
    for ii in range(0,N):
        simPDF.addPdf(lxg[ii],"sample_%i"%ii)
 
    #do the fit
    simPDF.fitTo(dataALL)
    #prepare ret val
    #retVal=retV(scale.getValV(),scale.getError(),sg.getValV(),sg.getError())
    retVal=None
    #histpdf for drawing
    histpdf_MC=[]
    Qf2=[]
    histpdf_MC_SCALED=[]
    for ii in range(0,N):
        histpdf_MC.append(ROOT.RooHistPdf(name+"_histpdf_MC_%i"%ii,name+"_histpdf_MC_%i"%ii, x[ii], dh_draw[ii], 2));  
        Qf2.append(ROOT.RooPolyVar(name+"_HCAL_f2_%i"%ii, name+"_HCAL_f2_%i"%ii, x[ii], ROOT.RooArgSet(p0, scale[ii])));
        histpdf_MC_SCALED.append(ROOT.RooHistPdf(name+"_histpdf_MC_SCALED_%i"%ii,name+"_histpdf_MC_SCALED_%i"%ii, Qf2[ii], E[ii], dh[ii], 2));

    frames=[]
    for ii in range(0,N):
        frames.append(x[ii].frame())
        data[ii].plotOn(frames[ii])
        lxg[ii].plotOn(frames[ii],LineColor=ROOT.kRed)
        histpdf_MC[ii].plotOn(frames[ii],LineColor=ROOT.kGreen,LineWidth=1);
        histpdf_MC_SCALED[ii].plotOn(frames[ii],LineColor=ROOT.kBlue,LineWidth=1);
    return retVal,frames

It seems to me that something is wrong in this implementation. Indeed, the result I get from the fit for the 4 parameters (N=2 here) is below, note that 0.95 (for scale) and 0.5 (for sigma) are the initial values. It looks to me that roofit is not capable at all to fit the 4 parameters to the dual dataset.


 1  hcal1_scale_0   7.17610e-01   4.18627e-01   0.00000e+00  -1.00167e-01
   2  hcal1_scale_1   9.50000e-01   4.18627e-01   0.00000e+00  -1.00167e-01
   3  hcal1_sg_0   5.00000e-01   2.09162e+00   0.00000e+00  -1.16602e+00
   4  hcal1_sg_1   5.00000e-01   2.09162e+00   0.00000e+00  -1.16602e+00

May I ask you for guidance on this?

Thanks,

Andrea Celentano

siliataider · September 24, 2025, 1:22pm

Hi Andrea,
Thank you for your question.
@jonas, could you please have a look?

andrea.celentano · September 30, 2025, 7:29am

Hi @siliataider, thanks for pinging @jonas - let’s see the reply!

jonas · October 10, 2025, 9:34am

Dear @andrea.celentano,

sorry for the late reply!

What you are doing should work. I think the reason why it doesn’t is that the way the model is formulated it not really consistent.

Your pdf depends on the E variables, and the data contains x variables. Then you have Q_f depending on x, but there is no relation between x and E, other than indirectly coupling it via this RooFFTConvPdf constructor where I’m not sure if it’s the correct one.

Can you try to implement it in a different way that I think is more correct/robust?

I’d use the one single variable for the observable, unifying x and E. Then, to do the scaling, you put your RooFFTConvPdf into a RooGenericPdf, where you do the linear transformation. Transforming pdfs is supported better compared to transforming variables.

If you have only one variable for the observable, you can also use this more standard RooFFTConvPdf constructor.

Let me know here if that works or you have any further questions!

Cheers,
Jonas

andrea.celentano · October 10, 2025, 12:15pm

Dear @jonas,

thanks for your guidance. Indeed, I confirm what you say: my pdf depends on E, and the data on x. The connection between these two happens in RooFFTConvPDF, using the constructor you pointed out. My idea was to follow the example in rf210_angularconv tutorial. I have a RooDataHist and a RooGaussian, both defined with respect of observable E, and I create their convolution. The convolution is done with respect to E, but the result is expressed as a function of Q_f=\alpha \cdot x, so that the result, at the end, can be seen as a PDF for x.

I can also try the approach you mention, may I ask you how I can use a RooGenericPDF to linearly scale one of the two PDFs? I mean, if I do @0 * @1, I realize the product of two PDFs, while what I’d like to accomplish here is something as f(\alpha x), with \alpha being a parameter.

In general, I used already my approach in the past, in case of one PDF fitted to one dataset. Maybe here the issue is with the simultaneous fit?

Thanks
Andrea

andrea.celentano · October 10, 2025, 2:27pm

Dear @jonas, to possibly facilitate the analysis of this problem I attach to this message the full code I am using, as well as the two ROOT files containing the data to be fit and the MC-derived histogram from which I extract the PDF.

To run it, python3 ana_globalFit.py 1 (1 is for “period_1.root”).

As you see, if you comment line 161 and uncomment lines 164 and 165 you will move from a simultaneous fit to a one-by-one fit. The latter works, the first does not.
Thanks again
Andrea
code.tar (33.3 KB)

jonas · October 10, 2025, 3:12pm

Thank you very much! If it worked in the past on a single dataset, it should work. I’ll look at your reproducer and try to understand what’s happening

jonas · October 12, 2025, 10:40pm

Ok, indeed I can reproduce the problem and this should not happen!

I have opened a GitHub issue about it to remind myself to fix it for the next release:

github.com/root-project/root

[RF] RooFFTConvPdf inside RooSimultaneous doesn't work in some cases

opened 10:39PM - 12 Oct 25 UTC

guitargeek

bug in:RooFit/RooStats

### Description This was reported on the forum: https://root-forum.cern.ch/t/r…oofit-multiple-fit-different-datasets-using-different-pdfs-with-some-shared-parameters/64208/7 The reproducer below is a simplified version of the reproducer on the forum, translated to C++ so it can later be used as a unit test. ### Reproducer ```c++ #include "RooRealVar.h" #include "RooCategory.h" #include "RooDataHist.h" #include "RooArgSet.h" #include "RooPolyVar.h" #include "RooHistPdf.h" #include "RooGaussian.h" #include "RooFFTConvPdf.h" #include "RooSimultaneous.h" #include "TString.h" #include <iostream> #include <vector> void ana_globalFit() { using namespace RooFit; // create all RooRealVar RooRealVar xvar("x", "", 0.0, 10.0); // Define category to distinguish physics and control samples events RooCategory sample("sample", "sample"); sample.defineType("sample_0"); // create all RooDataHist for DATA, prepare dictionary RooDataHist data("data_0", "", RooArgList(xvar)); std::vector<int> content = {0, 0, 0, 0, 1, 0, 2, 0, 0, 3, 5, 8, 6, 15, 17, 18, 23, 14, 24, 25, 33, 33, 43, 39, 41, 48, 51, 39, 43, 39, 49, 27, 31, 29, 27, 29, 38, 34, 22, 23, 21, 12, 19, 12, 10, 10, 11, 9, 10, 9, 8, 6, 4, 7, 4, 6, 6, 3, 2, 2, 4, 4, 4, 5, 2, 2, 1, 1, 1, 1, 1, 0, 3, 1, 0, 2, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1}; for (size_t i = 0; i < content.size(); ++i) { data.set(i, content[i], -1); } xvar.setBins(data.numEntries()); // create the GLOBAL dataset std::map<std::string, std::unique_ptr<RooDataHist>> importMap; importMap["range_0"] = std::make_unique<RooDataHist>(data); RooDataHist dataALL("data", "", RooArgList(xvar), Index(sample), Import(importMap)); // create the RooRealVar for MC RooRealVar E("E_0", "", 0, 500); E.setBins(10000, "cache"); std::vector<int> content2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 5, 13, 19, 55, 114, 145, 203, 295, 361, 497, 607, 683, 885, 964, 1167, 1240, 1328, 1379, 1479, 1583, 1638, 1667, 1667, 1705, 1694, 1677, 1757, 1723, 1556, 1606, 1563, 1513, 1516, 1346, 1329, 1336, 1322, 1252, 1170, 1124, 1092, 1060, 1017, 999, 959, 881, 836, 853, 772, 784, 747, 716, 654, 602, 602, 588, 535, 549, 517, 479, 495, 425, 402, 449, 403, 355, 348, 393, 361, 332, 320, 325, 277, 312, 277, 282, 230, 256, 253, 234, 234, 204, 205, 204, 195, 215, 194, 208, 183, 167, 144, 164, 159, 168, 153, 148, 161, 140, 135, 126, 149, 137, 115, 121, 110, 117, 104, 108, 121, 122, 83, 97, 91, 85, 81, 66, 66, 71, 80, 76, 75, 74, 73, 76, 53, 69, 64, 67, 70, 77, 76, 48, 55, 45, 56, 62, 50, 51, 49, 44, 53, 47, 48, 55, 49, 56, 41, 43, 33, 57, 53, 40, 44, 28, 42, 34, 35, 44, 45, 32, 34, 35, 28, 39, 27, 34, 21, 31, 26, 24, 28, 41, 31, 28, 30, 29, 30, 15, 23, 25, 27, 27, 21, 32, 20, 22, 32, 23, 23, 17, 20, 29, 23, 21, 21, 22, 18, 20, 27, 17, 20, 16, 18, 31, 20, 11, 15, 17, 20, 16, 11, 15, 14, 12, 17, 18, 13, 14, 13, 14, 14, 12, 13, 13, 9, 13, 12, 9, 7, 12, 17, 14, 6, 7, 17, 9, 10, 18, 11, 21, 10, 15, 8, 16, 11, 19, 16, 10, 13, 13, 11, 7, 14, 12, 12, 18, 11, 8, 9, 4, 9, 9, 4, 12, 8, 11, 7, 5, 7, 7, 7, 9, 7, 8, 6, 5, 5, 6, 11, 11, 11, 9, 10, 7, 6, 7, 8, 3, 8, 2, 7, 7, 10, 6, 8, 10, 9, 10, 11, 9, 7, 6, 15, 6, 7, 6, 5, 4, 8, 6, 7, 3, 2, 6, 4, 3, 10, 7, 4, 5, 6, 6, 5, 5, 2, 5, 6, 10, 1}; // create all RootDataHist for MC RooRealVar Ehist("E_0", "", 0, 10); Ehist.setBins(content2.size()); RooDataHist dh("dh_0", "", RooArgList(Ehist)); for (size_t i = 0; i < content2.size(); ++i) { dh.set(i, content2[i], -1); } // create the scale variable for MC - assume a single scale is ok for all energy ranges. RooRealVar scale("scale_0", "", 0.95, 0.5, 1.5); RooRealVar p0("p0", "", 0.0); RooPolyVar Qf("HCAL_F_0", "", xvar, RooArgSet(p0, scale)); // create all histo PDFs for MC. RooHistPdf histpdf("histpdf_0", "", E, dh); // gaussian model RooRealVar mg("mg", "", 0); RooRealVar sg("sg_0", "", 0.7, 0.1, 10.); RooGaussian gauss("gauss_0", "", E, mg, sg); // create all histo PDFs for MC after smearing. RooFFTConvPdf lxg("lxg_0", "", Qf, E, histpdf, gauss); // prepare the simultaneous PDF RooSimultaneous simPDF("simPDF", "", sample); simPDF.addPdf(lxg, "sample_0"); // compute NLLs std::unique_ptr<RooAbsReal> nll_single{lxg.createNLL(data, EvalBackend("legacy"))}; std::unique_ptr<RooAbsReal> nll{simPDF.createNLL(dataALL, EvalBackend("legacy"))}; std::vector<double> test_vals = {0.2, 0.3}; for (auto val : test_vals) { sg.setVal(val); double val_single = nll_single->getVal(); double val_sim = nll->getVal(); std::cout << val_single << std::endl; std::cout << val_sim << std::endl; } } ``` The output shows that the simultaneous NLL is always zero, while it's expected to be the same: ```txt ------------------------------------------------------------------ | Welcome to ROOT 6.37.01 https://root.cern | | (c) 1995-2025, The ROOT Team; conception: R. Brun, F. Rademakers | | Built for linuxx8664gcc on Jan 01 1980, 00:00:00 | | From heads/master@v6-37-01-8453-g301e0782d63 | | With clang version 21.1.1 | | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q' | ------------------------------------------------------------------ root [0] Processing ana_globalFit.C... [#1] INFO:InputArguments -- RooDataHist::importDHistSet(data) defining state "range_0" in index category sample [#1] INFO:Eval -- RooRealVar::setRange(E_0) new range named 'refrange_fft_lxg_0' created with bounds [0,500] [#1] INFO:Caching -- RooAbsCachedPdf::getCache(lxg_0) creating new cache 0x5595b0492960 with pdf histpdf_0_CONV_gauss_0_CACHE_Obs[p0,scale_0,x]_NORM_x for nset (x) with code 0 [#1] INFO:NumericIntegration -- RooRealIntegral::init(histpdf_0_CONV_gauss_0_CACHE_Obs[p0,scale_0,x]_NORM_x_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x) [#1] INFO:Fitting -- Creation of NLL object took 39.9617 ms [#1] INFO:Fitting -- RooAbsTestStatistic::initSimMode: created 0 slave calculators. [#1] INFO:Fitting -- Creation of NLL object took 105.859 μs 2236.42 0 1927.05 0 root [1] ``` The new default `"cpu"` backend has the same problem. ### ROOT version Probably any.

andrea.celentano · October 13, 2025, 7:08am

Hi @jonas, thanks for confirming this. I’ll then wait till next release to check this

system · October 27, 2025, 7:08am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.