Simultaneous fit with seperate ranges for each category

Hi all,

I am trying to implement a simultaneous fit in 4 categories, each fitting over the same variable (mass). I want to extend the mass fitting ranges for some of the categories, as they can be modelled at lower masses and give a better handle on the backgrounds.

I have seen the other posts and documentation about this, but cannot get it to work as the fit does not seem to converge, or is taking too long (> few hours). So want to check first that my approach is right before looking into other solutions.

The simultaneous fit is set up like this

w = ROOT.RooWorkspace("w")
m = ROOT.RooRealVar('{}'.format(B_mass), '{}'.format(B_mass), 3700., 6000.)

sample = ROOT.RooCategory("sample", "sample")
sample.defineType("LowQ2")
sample.defineType("HighQ2")
sample.defineType("LowQ2_SSB")    
sample.defineType("HighQ2_SSB")

m.setRange("range_LowQ2", Lowq2LC, Lowq2HC)
m.setRange("range_HighQ2", Highq2LC, Highq2HC)
m.setRange("range_LowQ2_SSB", Lowq2_SSB_LC, Lowq2_SSB_HC)
m.setRange("range_HighQ2_SSB", Highq2_SSB_LC, Highq2_SSB_HC)

Where the cuts for each category are within the range of allowed values for m

w.Import(m)

lowq2Data = ROOT.RooDataSet("lowq2data", "lowq2data", lowq2TTree, ROOT.RooArgSet(m))
highq2Data = ROOT.RooDataSet("highq2data", "highq2data", highq2TTree, ROOT.RooArgSet(m))
lowq2_SSB_Data = ROOT.RooDataSet("lowq2_SSB_Data", "lowq2_SSB_Data", lowq2_SSBTTree, ROOT.RooArgSet(m))
highq2_SSB_Data = ROOT.RooDataSet("highq2_SSB_Data", "highq2_SSB_Data", highq2_SSBTTree, ROOT.RooArgSet(m))

*Fitting model*

combData = ROOT.RooDataSet("combData", "combined data",
            ROOT.RooArgSet(m),
            ROOT.RooFit.Index(sample), 
            ROOT.RooFit.Import("LowQ2", lowq2Data),
            ROOT.RooFit.Import("HighQ2", highq2Data),
            ROOT.RooFit.Import("LowQ2_SSB", lowq2_SSB_Data),
            ROOT.RooFit.Import("HighQ2_SSB", highq2_SSB_Data),
            )
  
simPdf = ROOT.RooSimultaneous("simPdf", "simultaneous pdf", sample)
simPdf.addPdf(LowQ2Model, "LowQ2")
simPdf.addPdf(HighQ2Model, "HighQ2")
simPdf.addPdf(model_HQ2_SSB, "HighQ2_SSB")
simPdf.addPdf(model_LQ2_SSB, "LowQ2_SSB")

fit_results = simPdf.fitTo(combData, 
                            ROOT.RooFit.Save(),
                            ROOT.RooFit.Extended(ROOT.kTRUE),
                            ROOT.RooFit.Range("range"),
                            ROOT.RooFit.SplitRange(ROOT.kTRUE))

*Plotting* 

The code works for when the ranges are the same, and also if I just have SplitRange(kTRUE), but it just ignores the ranges and will fit over [3700,6000] for all categories (obvious from yields and results).

I think @jonas has been active on similar threads and helped me before with a similar issue, so I assume you are the right person to ask.

Thanks in advance,
Nathan

Dear Nathan,

You are right: @jonas is the right person to involve here.

Cheers,
Danilo

Hello @nheatley!

the problem seems to be that this SplitRange() feature doesn’t work if you use the same RooRealVar as the observable in each channel.

If you define a different mass variable for each channel (maybe suffixed with the channel name too), and then define the channel datasets and pdfs for that observable, then it should work.

Let me know if this is clear, and in any case if you want to get this fixed also for the case of using the same variable, please report this on GitHub:

Maybe I’ll solve this problem by explicitly forbidding using the same observable in different channels in RooFit. After all, it makes sense that the observables would have different names, because you are making different observations in each channel. That’s maybe also a philosophical question about what a “RooSimultaneous” should mean.

Maybe it would be good to keep it clean like this:

  • RooAddPdf: one observable but with different model components (p(x) = p_1(x) + p_2(x))
  • RooProdPdf: multiple observables measured at the same time (like p(x, y))
  • RooSimultaneous: a “template” for building the sum of NLLs of different observables (nll(X) + nll(Y), where capital X and Y are vectors of observable values corresponding to datasets.

Do you think enforcing this constraint on non-overlapping observables in RooSimultaneous would break too much existing code? I don’t know what people actually do, but for example when building models with HistFactory, the observable in each channel has a different name.

Hi @jonas,

Thanks for the help, this seems to have worked. I see your point about simultaneous fits technically being different observables, despite corresponding to the same feature. My only points here would be that:

a) When making the data sets for each category, the different RooRealVars need different names. When creating this dataset from a root file, then the ntuple maker needs to produce a different branch name for each of the datasets, which is a bit annoying to do, as the typically are produced by the same ntuple maker script.

b) If I do this then I can just define the range as unique for each of the different variables in the ranges I want and so I no longer need to use SplitRange(). I assume then the use of this would be in more complex cases, where I wanted a blinded fit region on one of the variables, for example?

But in general I think making this fix would not break too much existing code and would be relatively easily to work around.

Now that the fit result is working, I have been trying to run toyMC studies on the fits, but I am having some issues and was wondering if you could provide any insight. I can get the generation to successfully start, but it never seems to generate any toy data.

Relevant code is here:

Bck_YieldLQ2, Bck_YieldHQ2, LQ2_SSB_Yield, HQ2_SSB_Yield  = 6432, 38833, 671, 14286
Sig_yieldLQ2, Sig_yieldHQ2 = 34, 18502
KpmJPsi_Yield = 143
 
n_events_LQ2 = Bck_YieldLQ2 + Sig_yieldLQ2
n_events_HQ2 =  Bck_YieldHQ2 + Sig_yieldHQ2 
n_events_tot = n_events_LQ2 + n_events_HQ2 + LQ2_SSB_Yield + HQ2_SSB_Yield

B_mass = 'B_mass'

nToys = 10
w = ROOT.RooWorkspace("w")
# ------------------ Model -----------------------
m_LQ2 = ROOT.RooRealVar('{}_LQ2'.format(B_mass), '{}_LQ2'.format(B_mass), 3700.,6000.)
m_HQ2 = ROOT.RooRealVar('{}'.format(B_mass), '{}'.format(B_mass), 4200.,5700. )
m_LQ2_SSB = ROOT.RooRealVar('{}_LQ2_SSB'.format(B_mass), '{}_LQ2_SSB'.format(B_mass),4000.,5700. )
m_HQ2_SSB = ROOT.RooRealVar('{}_HQ2_SSB'.format(B_mass), '{}_HQ2_SSB'.format(B_mass), 4200.,5700.)

sample = ROOT.RooCategory("sample", "sample")
sample.defineType("LowQ2")
sample.defineType("HighQ2")
sample.defineType("LowQ2_SSB")    
sample.defineType("HighQ2_SSB")

w.Import(m_LQ2)
w.Import(m_HQ2)
w.Import(m_LQ2_SSB)
w.Import(m_HQ2_SSB)


*fitting model *


# -------- Simultaneous fitting ------
LowQ2Model = w.pdf('LowQ2Model')
HighQ2Model = w.pdf('HighQ2Model')
model_LQ2_SSB = w.pdf('model_LQ2_SSB')
model_HQ2_SSB = w.pdf('model_HQ2_SSB')
   
    
simPdf = ROOT.RooSimultaneous("simPdf", "simultaneous pdf", sample)
simPdf.addPdf(LowQ2Model, "LowQ2")
simPdf.addPdf(HighQ2Model, "HighQ2")
simPdf.addPdf(model_HQ2_SSB, "HighQ2_SSB")
simPdf.addPdf(model_LQ2_SSB, "LowQ2_SSB")
     
# ---------- Fit Params----------------
#------ Signal params ---------
mu = w.var('mu')
sigL = w.var('sigL')
  
#------ LowQ2 ---------
ErFMean_LQ2 = w.var('ErF_PDF_LQ2')
k_LQ2 = w.var('k_LQ2') 
c_LQ2 = w.var('c_LQ2')
f_LQ2 = w.var('f_LQ2')
Sig_yieldLQ2 = w.var('Sig_yieldLQ2')
Bck_YieldLQ2 = w.var('Bck_YieldLQ2')
  
#------ HighQ2 ---------
ErFMean_HQ2 = w.var('ErFMean_HQ2')
k_HQ2 = w.var('k_HQ2')
c_HQ2 = w.var('c_HQ2')
f_HQ2 = w.var('f_HQ2')
Sig_yieldHQ2 = w.var('Sig_yieldHQ2')
Bck_YieldHQ2 = w.var('Bck_YieldHQ2')
  
#------ LowQ2 SSB ---------
ErFMean_LQ2_SSB = w.var('ErFMean_LQ2_SSB')
k_LQ2_SSB = w.var('k_LQ2_SSB')
f_LQ2_SSB = w.var('f_LQ2_SSB')
LQ2_SSB_Yield = w.var('LQ2_SSB_Yield')
  
#------ HighQ2 SSB ---------
ErFMean_HQ2_SSB = w.var('ErFMean_HQ2_SSB')
k_HQ2_SSB = w.var('k_HQ2_SSB')
f_HQ2_SSB = w.var('f_HQ2_SSB')
HQ2_SSB_Yield = w.var('HQ2_SSB_Yield')
KpmJPsi_Yield = w.var('KpmJPsi_Yield')

mcstudy_SimHighLowQ2Simfit = ROOT.RooMCStudy(simPdf, ROOT.RooArgSet(m_LQ2, m_HQ2, m_LQ2_SSB, m_HQ2_SSB, sample), ROOT.RooFit.Silence(), ROOT.RooFit.FitOptions(ROOT.RooFit.PrintEvalErrors(0),  ROOT.RooFit.Save(ROOT.kTRUE) ,ROOT.RooFit.Extended(ROOT.kTRUE)))

mcstudy_SimHighLowQ2Simfit.generateAndFit(nToys, 100, ROOT.kTRUE)