Roofit - multiple fit different datasets using different PDFs with some shared parameters

Dear colleagues,

I want to solve the following problem using Roofit. I have N datasets for different observables, each in forms of a RooDataHist, and for each I construct a PDF from a template; the PDF is in form of a RooHistPDF. Each PDF is convoluted with a Gaussian PDF of unknown sigma, and then linearly scaled by a linear transformation x^\\prime = \\alpha x. The parameter \\alpha accounts for a possible mismatch in the absolute scale between the measured dataset and the template.

In this configuration, I thus have 2N free parameters, with each pair (\\alpha_i,\\sigma_i) independent form the other pairs. Clearly, I could execute N independent fits, but my goal, later, will be to model \\sigma with its own PDF, and extract the parameters for the latter.

I report below a snippet of my code.

#histoData is a list of histograms for DATA
#histoMC is a list of histograms for MC (create the PDF from these)
#xmin is a list of minimum x axis
#xmax is a list of maximum x axis

def doFit(histoData,histoMC,xmin,xmax,name,sigma,sigmaMin,sigmaMax,alphaIN=1):

    N=len(histoData)
    
    #create all RooRealVar
    x=[]
    for ii in range (0,N):
        x.append(ROOT.RooRealVar(name+"_x_%i"%ii,name+"_E [GeV]",xmin[ii],xmax[ii]))

    #create a global rooRealVar
    xALL=ROOT.RooArgSet(name+"_x")
    for ii in range(0,N):
        xALL.add(x[ii])
    
    # Define category to distinguish physics and control samples events
    sample = ROOT.RooCategory("sample", "sample")
    for ii in range(0,N):
        sample.defineType("sample_%i"%ii)

    #create all root data hist for DATA, prepare dictionary
    data=[]
    dictData={}
    for ii in range(0,N):
        data.append(ROOT.RooDataHist(name+("_data_%i"%ii),name+("_data_%i"%ii),x[ii],histoData[ii]))
        dictData["range_%i"%ii]=data[ii]
        
    #create the GLOBAL dataset
    dataALL=ROOT.RooDataHist(name+"_data",name+"_data",xALL,Index=sample,Import=dictData)
    
    #create the RooRealVar for MC
    E=[]
    for ii in range(0,N):
        E.append(ROOT.RooRealVar(name+"_E_%i"%ii,name+"_E_%i"%ii, 0, 500));
        E[ii].setBins(10000,"cache")
    #create all RootDataHist for MC
    dh=[]
    dh_draw=[]
    for ii in range(0,N):
        dh.append(ROOT.RooDataHist(name+("_dh_%i"%ii),name+("_dh_%i"%ii),E[ii],histoMC[ii]));
        dh_draw.append(ROOT.RooDataHist(name+("_dh_%i"%ii),name+("_dh_%i"%ii),x[ii],histoMC[ii]));

        
    #create the scale variable for MC - assume a single scale is ok for all energy ranges.
    scale=[]
    for ii in range(0,N):
        scale.append(ROOT.RooRealVar(name+"_scale_%i"%ii,name+"_scale_%i"%ii,0.95,0.5,1.5))
    p0=ROOT.RooRealVar(name+"_p0",name+"_p0",0.)

    Qf=[]
    for ii in range(0,N):
        Qf.append(ROOT.RooPolyVar(name+"_HCAL_F_%i"%ii,name+"_HCAL_F_%i"%ii,x[ii],ROOT.RooArgSet(p0,scale[ii])))

    #create all histo PDFs for MC.
    histpdf=[]
   
    for ii in range(0,N):
        histpdf.append(ROOT.RooHistPdf(name+("_histpdf_%i"%ii),name+("_histpdf_%i"%ii),E[ii],dh[ii]))
    
    
    #TODO: gaussian model
    gauss=[]
    mg=ROOT.RooRealVar(name+"_mg",name+"_mg", 0);
    #sg=ROOT.RooRealVar(name+"_sg_%i"%ii,name+"_sg_%i"%ii,sigma,sigmaMin,sigmaMax)
    #prepare gaus PDFs
    sg=[]
    for ii in range(0,N):
        sg.append(ROOT.RooRealVar(name+"_sg_%i"%ii,name+"_sg_%i"%ii,sigma,sigmaMin,sigmaMax));
        gauss.append(ROOT.RooGaussian(name+"_gauss_%i"%ii,name+"_gauss_%i"%ii,E[ii],mg,sg[ii]))
    

        
    #create all histo PDFs for MC after smearing.
    lxg=[]
    for ii in range(0,N):
        lxg.append(ROOT.RooFFTConvPdf(name+"_lxg_%i"%ii,"histpdf (X) gauss", Qf[ii], E[ii], histpdf[ii], gauss[ii]));

  
      
    #prepare the simultaneous PDF
    simPDF = ROOT.RooSimultaneous(name+"_simPDF",name+"_simPDF",sample)
    for ii in range(0,N):
        simPDF.addPdf(lxg[ii],"sample_%i"%ii)
 
    #do the fit
    simPDF.fitTo(dataALL)
    #prepare ret val
    #retVal=retV(scale.getValV(),scale.getError(),sg.getValV(),sg.getError())
    retVal=None
    #histpdf for drawing
    histpdf_MC=[]
    Qf2=[]
    histpdf_MC_SCALED=[]
    for ii in range(0,N):
        histpdf_MC.append(ROOT.RooHistPdf(name+"_histpdf_MC_%i"%ii,name+"_histpdf_MC_%i"%ii, x[ii], dh_draw[ii], 2));  
        Qf2.append(ROOT.RooPolyVar(name+"_HCAL_f2_%i"%ii, name+"_HCAL_f2_%i"%ii, x[ii], ROOT.RooArgSet(p0, scale[ii])));
        histpdf_MC_SCALED.append(ROOT.RooHistPdf(name+"_histpdf_MC_SCALED_%i"%ii,name+"_histpdf_MC_SCALED_%i"%ii, Qf2[ii], E[ii], dh[ii], 2));

    frames=[]
    for ii in range(0,N):
        frames.append(x[ii].frame())
        data[ii].plotOn(frames[ii])
        lxg[ii].plotOn(frames[ii],LineColor=ROOT.kRed)
        histpdf_MC[ii].plotOn(frames[ii],LineColor=ROOT.kGreen,LineWidth=1);
        histpdf_MC_SCALED[ii].plotOn(frames[ii],LineColor=ROOT.kBlue,LineWidth=1);
    return retVal,frames

It seems to me that something is wrong in this implementation. Indeed, the result I get from the fit for the 4 parameters (N=2 here) is below, note that 0.95 (for scale) and 0.5 (for sigma) are the initial values. It looks to me that roofit is not capable at all to fit the 4 parameters to the dual dataset.


 1  hcal1_scale_0   7.17610e-01   4.18627e-01   0.00000e+00  -1.00167e-01
   2  hcal1_scale_1   9.50000e-01   4.18627e-01   0.00000e+00  -1.00167e-01
   3  hcal1_sg_0   5.00000e-01   2.09162e+00   0.00000e+00  -1.16602e+00
   4  hcal1_sg_1   5.00000e-01   2.09162e+00   0.00000e+00  -1.16602e+00

May I ask you for guidance on this?

Thanks,

Andrea Celentano

Hi Andrea,
Thank you for your question.
@jonas, could you please have a look?

Hi @siliataider, thanks for pinging @jonas - let’s see the reply!

Dear @andrea.celentano,

sorry for the late reply!

What you are doing should work. I think the reason why it doesn’t is that the way the model is formulated it not really consistent.

Your pdf depends on the E variables, and the data contains x variables. Then you have Q_f depending on x, but there is no relation between x and E, other than indirectly coupling it via this RooFFTConvPdf constructor where I’m not sure if it’s the correct one.

Can you try to implement it in a different way that I think is more correct/robust?

I’d use the one single variable for the observable, unifying x and E. Then, to do the scaling, you put your RooFFTConvPdf into a RooGenericPdf, where you do the linear transformation. Transforming pdfs is supported better compared to transforming variables.

If you have only one variable for the observable, you can also use this more standard RooFFTConvPdf constructor.

Let me know here if that works or you have any further questions!

Cheers,
Jonas

Dear @jonas,

thanks for your guidance. Indeed, I confirm what you say: my pdf depends on E, and the data on x. The connection between these two happens in RooFFTConvPDF, using the constructor you pointed out. My idea was to follow the example in rf210_angularconv tutorial. I have a RooDataHist and a RooGaussian, both defined with respect of observable E, and I create their convolution. The convolution is done with respect to E, but the result is expressed as a function of Q_f=\alpha \cdot x, so that the result, at the end, can be seen as a PDF for x.

I can also try the approach you mention, may I ask you how I can use a RooGenericPDF to linearly scale one of the two PDFs? I mean, if I do @0 * @1, I realize the product of two PDFs, while what I’d like to accomplish here is something as f(\alpha x), with \alpha being a parameter.

In general, I used already my approach in the past, in case of one PDF fitted to one dataset. Maybe here the issue is with the simultaneous fit?

Thanks
Andrea

Dear @jonas, to possibly facilitate the analysis of this problem I attach to this message the full code I am using, as well as the two ROOT files containing the data to be fit and the MC-derived histogram from which I extract the PDF.

To run it, python3 ana_globalFit.py 1 (1 is for “period_1.root”).

As you see, if you comment line 161 and uncomment lines 164 and 165 you will move from a simultaneous fit to a one-by-one fit. The latter works, the first does not.
Thanks again
Andrea
code.tar (33.3 KB)

Thank you very much! If it worked in the past on a single dataset, it should work. I’ll look at your reproducer and try to understand what’s happening

Ok, indeed I can reproduce the problem and this should not happen!

I have opened a GitHub issue about it to remind myself to fix it for the next release:

Hi @jonas, thanks for confirming this. I’ll then wait till next release to check this :slight_smile: