Roofit in python: Segmentation faults all over the place

I’ve been having a lot of segmentation fault issues building Roofit workspaces in python. I’ve hacked my script several times to bypass some of them but this really seems like a larger issue and now I’m stuck on one. I’m using CMSSW_7_4_7_patch2 with the Higgs-Combine tool.

I’ll first show one of the seg faults that I solved and how I fixed it and then show the one that’s currently plaguing me.

The first seg fault I faced was from this bit of code

from ROOT import *

import math
from math import sqrt

testTH1 = TH1F('testTH1','testTH1',10,0,10)

for ix in range(1,testTH1.GetNbinsX()+1):
	valx = 2*ix

	testTH1.SetBinContent(ix,valx)
	testTH1.SetBinError(ix,testTH1.GetBinContent(ix)/4)

xVar = RooRealVar('xVar','xtitle',0,10)

binList = RooArgList()
for xbin in range(1,testTH1.GetXaxis().GetNbins()+1):
    name = 'binVar_'+str(xbin)
    title = 'title_binVar_'+str(xbin)
    binContent = testTH1.GetBinContent(xbin)
    binErrUp = binContent + testTH1.GetBinErrorUp(xbin)
    binErrDown = binContent - testTH1.GetBinErrorLow(xbin)

    binRRV = RooRealVar(name, title, binContent, max(binErrDown,0), max(binErrUp,0))

    binList.add(binRRV)


final = RooParametricHist('test_RPH','test_RPH',xVar, binList, testTH1)
		
myWorkspace = RooWorkspace('myW')

getattr(myWorkspace,'import')(final)
myWorkspace.writeToFile('basetest.root',True) 

I can post the full crash log if anyone is interested but it comes down to #6 0x00007ff3a4c53459 in __dynamic_cast. The seg fault comes when importing to myWorkspace and the RooParametricHist is cloned. During the cloning, it tries to do a dynamic cast of the items in binList but they aren’t in memory anymore (or at least where they are supposed to be).

I fixed it by moving the call to RooArgList.add() to a second for loop and instead stored the RooRealVars in a python list temporarily like so

import ROOT
from ROOT import *

import math
from math import sqrt

# ROOT.gROOT.SetBatch(True)
# ROOT.PyConfig.IgnoreCommandLineOptions = True

testTH1 = TH1F('testTH1','testTH1',10,0,10)

for ix in range(1,testTH1.GetNbinsX()+1):
	valx = 2*ix

	testTH1.SetBinContent(ix,valx)
	testTH1.SetBinError(ix,testTH1.GetBinContent(ix)/4)

xVar = RooRealVar('xVar','xtitle',0,10)

testList = []
binList = RooArgList()
for xbin in range(1,testTH1.GetXaxis().GetNbins()+1):
    name = 'binVar_'+str(xbin)
    title = 'title_binVar_'+str(xbin)
    binContent = testTH1.GetBinContent(xbin)
    binErrUp = binContent + testTH1.GetBinErrorUp(xbin)
    binErrDown = binContent - testTH1.GetBinErrorLow(xbin)

    binRRV = RooRealVar(name, title, binContent, max(binErrDown,0), max(binErrUp,0))

    testList.append(binRRV)

for item in testList:
	binList.add(item)


final = RooParametricHist('test_RPH','test_RPH',xVar, binList, testTH1)
		
myWorkspace = RooWorkspace('myW')

getattr(myWorkspace,'import')(final)
myWorkspace.writeToFile('basetest.root',True) 

In my opinion - a very dumb fix. Now whenever I have a similar loop, I can’t store my objects in a RooArgList() until after the full loop (I keep them in a python list instead). Except I’ve come across a situation where I can’t do that now. I’ve posted the code below for completeness but the gist now is that the RooFormulaVars in binListPass do not want to be imported into the workspace with seg fault 0x00007ffff789c1a0 in typeinfo for TObject (). Does anyone have any suggestions or know why this is happening? Thanks for any and all help!

from ROOT import *

def makeDummyFromPDF(name,distEq,cat,xVar,yVar,printOpt=True):
    # Gaussian
    if distEq == 'gauss':
        meanx = RooConstVar('meanx','meanx',5)
        sigmax = RooConstVar('sigmax','sigmax',1)

        meany = RooConstVar('meany','meany',2.5)
        sigmay = RooConstVar('sigmay','sigmay',0.5)

        dummyPDFx = RooGaussian(name+'x',name+'x',xVar,meanx,sigmax)
        dummyPDFy = RooGaussian(name+'y',name+'y',yVar,meany,sigmay)

        dummyPDF = RooProdPdf(name,name,RooArgList(dummyPDFx,dummyPDFy))

        if cat == 'pass':
            dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),100)
        elif cat == 'fail':
            dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),10)

        dummyRDH = RooDataHist('signal_'+cat,'signal_'+cat,RooArgSet(xVar,yVar),dummyRDS)

        

    # Skewed gaussian up
    elif distEq == 'gaussUp':
        meanx = RooConstVar('meanx','meanx',6)
        sigmax = RooConstVar('sigmax','sigmax',1)
        tailx = RooConstVar('tailx','tailx',0.5)

        meany = RooConstVar('meany','meany',3)
        sigmay = RooConstVar('sigmay','sigmay',0.5)
        taily = RooConstVar('taily','taily',0.5)

        dummyPDFx = RooNovosibirsk(name+'x',name+'x',xVar,meanx,sigmax,tailx)
        dummyPDFy = RooNovosibirsk(name+'y',name+'y',yVar,meany,sigmay,taily)

        dummyPDF = RooProdPdf(name,name,RooArgList(dummyPDFx,dummyPDFy))
        if cat == 'pass':
            dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),100)
        elif cat == 'fail':
            dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),10)

        dummyRDH = RooDataHist('signal_'+cat+'_smearUp','signal_'+cat+'_smearUp',RooArgSet(xVar,yVar),dummyRDS)

    # Skewed guassian down
    elif distEq == 'gaussDown':
        meanx = RooConstVar('meanx','meanx',4)
        sigmax = RooConstVar('sigmax','sigmax',1)
        tailx = RooConstVar('tailx','tailx',-0.5)

        meany = RooConstVar('meany','meany',2)
        sigmay = RooConstVar('sigmay','sigmay',0.5)
        taily = RooConstVar('taily','taily',-0.5)

        dummyPDFx = RooNovosibirsk(name+'x',name+'x',xVar,meanx,sigmax,tailx)
        dummyPDFy = RooNovosibirsk(name+'y',name+'y',yVar,meany,sigmay,taily)

        dummyPDF = RooProdPdf(name,name,RooArgList(dummyPDFx,dummyPDFy))
        if cat == 'pass':
            dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),100)
        elif cat == 'fail':
            dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),10)

        dummyRDH = RooDataHist('signal_'+cat+'_smearDown','signal_'+cat+'_smearDown',RooArgSet(xVar,yVar),dummyRDS)

    # Generic
    else:
        dummyPDF = RooGenericPdf(name,distEq,RooArgList(xVar,yVar))
        dummyRDS = dummyPDF.generate(RooArgSet(xVar,yVar),50000) # output is a RooDataSet
        dummyRDH = RooDataHist(name+'_'+cat,name+'_'+cat,RooArgSet(xVar,yVar),dummyRDS)


    dummyTH2 = dummyRDS.createHistogram(xVar,yVar,10,5,'',name)

    if printOpt == True:
        print 'Check this is the shape you want for: ' + distEq
        dummyTH2.Draw('lego')
        raw_input('Hit enter to confirm this shape')

    return dummyTH2, dummyRDH


if __name__ == '__main__':
    
    # Establish our axis variables
    xVar = RooRealVar('myx','myx',0,10)
    yVar = RooRealVar('myy','myy',0,5)


    # We need to make some dummy 2D histograms that we know the form of
    # pass = x^3*y^2, fail = x*y, Rp/f = (ax^2+bx+c)(dy+f) where a*d=1, b=c=f=0
    data_pass_TH2,data_pass_RDH = makeDummyFromPDF('data_obs','(myx**3)*(myy**2)','pass',xVar,yVar,False)
    # bkg_pass_TH2,bkg_pass_RDH = makeDummyFromPDF('mybkg','myx*myy','pass',xVar,yVar,False)
    signal_pass_TH2,signal_pass_RDH = makeDummyFromPDF('signal','gauss','pass',xVar,yVar,False)
    signalUp_pass_TH2,signalUp_pass_RDH = makeDummyFromPDF('signalUp','gaussUp','pass',xVar,yVar,False)
    signalDown_pass_TH2,signalDown_pass_RDH = makeDummyFromPDF('signalDown','gaussDown','pass',xVar,yVar,False)

    data_fail_TH2,data_fail_RDH = makeDummyFromPDF('data_obs','myx*myy','fail',xVar,yVar,False)
    # bkg_fail_TH2,bkg_fail_RDH = makeDummyFromPDF('mybkg','myx*myy','fail',xVar,yVar,False)
    signal_fail_TH2,signal_fail_RDH = makeDummyFromPDF('signal','gauss','fail',xVar,yVar,False)
    signalUp_fail_TH2,signalUp_fail_RDH = makeDummyFromPDF('signalUp','gaussUp','fail',xVar,yVar,False)
    signalDown_fail_TH2,signalDown_fail_RDH = makeDummyFromPDF('signalDown','gaussDown','fail',xVar,yVar,False)


    # Get some starting guess values for the coefficients of the Rp/f (po1 vs pol2)
    myguesses = {
        "nom":[
            [0.,0.],     # nominal, x0 [y0,y1]
            [0.,0.],     # nominal, x1
            [0.,1.]      # nominal, x2
        ],
        "up":[
            [1.,1.],
            [1.,1.],
            [1.,3.]
        ],
        "down":[
            [-1.,-1.],
            [-1.,-1.],
            [-1.,-3.]
        ]
    }

    # Store the guesses as RRVs in an easy-to-access dictionary
    polYO = 1
    polXO = 2
    PolyCoeffs = {}
    for yi in range(polYO+1):
        thisXCoeffList = RooArgList()
        for xi in range(polXO+1):
            name = 'polyCoeff_'+'x'+str(xi)+'y'+str(yi)
            PolyCoeffs['x'+str(xi)+'y'+str(yi)] = RooRealVar(name,name,myguesses['nom'][xi][yi],myguesses['down'][xi][yi],myguesses['up'][xi][yi])


    # Now loop through all of our bins
    dumbListFail = []
    dumbListPass = []
    dumbPolyVars = []
    for ybin in range(1,data_fail_TH2.GetYaxis().GetNbins()+1):
        for xbin in range(1,data_fail_TH2.GetXaxis().GetNbins()+1):

            # First make our fail bins into RRVs
            name = 'Fail_bin_'+str(xbin)+'-'+str(ybin)
            binContent = data_fail_TH2.GetBinContent(xbin,ybin)
            binErrUp = binContent + data_fail_TH2.GetBinErrorUp(xbin,ybin)
            binErrDown = binContent - data_fail_TH2.GetBinErrorLow(xbin,ybin)
            binRRV = RooRealVar(name, name, binContent, max(binErrDown,0), max(binErrUp,0))
            # Store the bin
            dumbListFail.append(binRRV)

            # Then for this bin, make a RooPolyVar
            xCenter = data_fail_TH2.GetXaxis().GetBinCenter(xbin)
            yCenter = data_fail_TH2.GetYaxis().GetBinCenter(ybin)

            xConst = RooConstVar("ConstVar_x_"+str(xCenter)+'_'+str(yCenter),"ConstVar_x_"+str(xCenter)+'_'+str(yCenter),xCenter)
            yConst = RooConstVar("ConstVar_y_"+str(xCenter)+'_'+str(yCenter),"ConstVar_y_"+str(xCenter)+'_'+str(yCenter),yCenter)

            xPolyList = RooArgList()
            dumbXPolyList = []
            for yCoeff in range(polYO+1):
                xCoeffList = RooArgList()
                dumbXCoeffList = []

                for xCoeff in range(polXO+1):                    
                    dumbXCoeffList.append(PolyCoeffs['x'+str(xCoeff)+'y'+str(yCoeff)])
                
                for xco in dumbXCoeffList:
                    xCoeffList.add(xco)

                thisXPolyVarLabel = "xPol_Bin_"+str(int(xCenter))+"_"+str(int(yCenter))
                xPolyVar = RooPolyVar(thisXPolyVarLabel,thisXPolyVarLabel,xConst,xCoeffList)
                dumbXPolyList.append(xPolyVar)

            for xpol in dumbXPolyList:
                xPolyList.add(xpol)

            thisYPolyVarLabel = "FullPol_Bin_"+str(round(xCenter))+"_"+str(round(yCenter))
            thisFullPolyVar = RooPolyVar(thisYPolyVarLabel,thisYPolyVarLabel,yConst,xPolyList)

            dumbPolyVars.append(thisFullPolyVar)

            # # Finally make the pass distribution
            # formulaArgList = RooArgList(binRRV,thisFullPolyVar)
            # thisBinPass = RooFormulaVar('Pass_bin_'+str(xbin)+'-'+str(ybin),'Pass_bin_'+str(xbin)+'-'+str(ybin),"@0*@1",formulaArgList)
            # dumbListPass.append(thisBinPass)

    # Do the dumb bit to make the fail distribution
    binListFail = RooArgList()
    binListPass = RooArgList()
    if len(dumbListFail) != len(dumbPolyVars):
        'Lists not same length - breaking'
        quit()
    else:
        for binIndex, failbin in enumerate(dumbListFail):
            # Get the failing bins
            binListFail.add(failbin)

            # Make the passing bins
            name = failbin.GetName().replace('Fail','Pass')
            thisPoly = dumbPolyVars[binIndex]
            formulaArgList = RooArgList(failbin,thisPoly)
            thisBinPass = RooFormulaVar(name,name,"@0*@1",formulaArgList)
            dumbListPass.append(thisBinPass)



    for itemp in dumbListPass:
        binListPass.add(itemp)

    # binListPass.first().Print()
    # raw_input('waiting')

    print "Making RPH2Ds"
    # qcd_fail_RPH2D = RooParametricHist2D('qcd_fail','qcd_fail',xVar, yVar, binListFail, data_fail_TH2)
    # qcd_pass_RPH2D = RooParametricHist2D('qcd_pass','qcd_pass',xVar, yVar, binListPass, data_fail_TH2)
    print "Making norm"
    qcd_fail_RPH2D_norm = RooAddition('qcd_fail_norm','qcd_fail_norm',binListFail)
    qcd_pass_RPH2D_norm = RooAddition('qcd_pass_norm','qcd_pass_norm',binListPass)

    things2import = [
        data_pass_RDH,
        data_fail_RDH,
        # qcd_fail_RPH2D,
        # qcd_pass_RPH2D,
        qcd_fail_RPH2D_norm,
        qcd_pass_RPH2D_norm,
        signal_pass_RDH,
        signal_fail_RDH,
        signalUp_pass_RDH,
        signalUp_fail_RDH,
        signalDown_pass_RDH,
        signalDown_fail_RDH
    ]

    print "Making workspace..."
    # Make workspace to save in
    myWorkspace = RooWorkspace("w_test")
    for rdh in things2import:
        print "Importing " + rdh.GetName()
        getattr(myWorkspace,'import')(rdh,RooFit.RecycleConflictNodes())

    # Now save out the RooDataHists
    myWorkspace.writeToFile('base.root',True)  

RooFit likes to cast away const of const-ref arguments and retain pointers to such arguments. Even in C++ this is an odd thing to do (as temporaries can pass through such arguments), but with Python ref-counting, temporaries can disappear before the end of a statement (in C++ they can not), so that transliterated Python code of C++ examples can crash or, worse, silently behave differently (Python likes to reuse memory, so a fresh RooRealVar may already have taken its place by the time an old, deleted, one is eventually used).

For reference, here’s that RooArgList::add():

Bool_t RooAbsCollection::add(const RooAbsArg& var, Bool_t silent)
{
  // ...

  // add a pointer to this variable to our list (we don't own it!)
  _list.Add((RooAbsArg*)&var);

  // ...

It’s not my problem anymore, but for this and a few other spurious reasons (such as overloads on base and derived class pointers (!) with different semantics (!!)), I’ve always strongly recommended against using RooFit from Python. It awaits proper pythonizations by someone knowledgeable.

Thanks for the details! I suspect I’m going to move to C++ given this issue. However, I did find a temporary solution (for any future readers). You can define a python list at the beginning of the script (I called it allVars) and then every time you make a Roofit object (RooRealVar, RooPolyVar, etc) store it in the allVars list. You don’t have to use the allVars list for anything. It just keeps Python’s memory management straight and stops it from discarding the objects or overwriting them (or so it seems).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.