ROOT version: 6.32.02
Python version: 3.12.4
Dear experts,
I am writing a python code that loops over several ROOT files. In each loop, the root file is loaded into a RDataFrame, then the data is fitted according to some model, and the sPlot method is applied to retrieve the weights.
Now, the goal is to write out the sWeights in a new file (with all other branches from the initial ROOT file) using RDataFrame.Define()
, then Snapshot()
. The reason I’m using RDataFrame is to take advantage of the “easy” interface + speed that C++ offers, instead of converting to e.g. a pandas dataframe, adding the weights, and converting back to RDataFrame.
To write out the weights, I define a C++ lambda function using ROOT.gInterpreter.ProcessLine()
, as passing a python callable is not supported yet in RDataFrame.Define()
:
ROOT.gInterpreter.ProcessLine("""
auto getSignalSWeight = [](ULong64_t entry_number) -> Double_t {
return global_sData->GetSWeight(entry_number, "nsig");
};
""")
In order for the C++ code to work, I declare the pointer to the RooStats::SPlot
instance and give it the adress of the RooStats.SPlot
object I defined through python:
sData = RooStats.SPlot("sData", "sPlot", data, model, RooArgList(nsig, nbkg))
# Declare `sData` globally for the JIT compiler
ROOT.gInterpreter.ProcessLine(f"RooStats::SPlot* global_sData = nullptr;")
ROOT.gInterpreter.ProcessLine(f"global_sData = (RooStats::SPlot*){ROOT.addressof(sData)};")
Now, this works well when I run the code through one root file, but when I start looping on different files, the pipeline fails at the 2nd iteration. I understand it must have something to do with the RooStats::SPlot*
pointer, but even deleting the pointer manually or the sData instance causes a segfault.
What is happening there? How can I fix this issue without having to modify the workflow? (eg, not having to run the file X times instead of looping over X files). Also, is this horrible practice? I’m trying to make the most of the python interfacing and RDataFrame() but I don’t if this is good or not.
Here’s a minimal example to reproduce the issue:
import ROOT
from ROOT import RooFit, RooRealVar, RooGaussian, RooAddPdf, RooStats, RooDataSet, RooArgList, RooArgSet
def fit(index):
x = RooRealVar("x", "Observable", -10, 10)
# Signal
mean_sig = RooRealVar("mean_sig", "Mean of Gaussian", 1, -10, 10)
sigma_sig = RooRealVar("sigma_sig", "Width of Gaussian", 1, 0.1, 5)
gauss_sig = RooGaussian("gauss_sig", "Signal Gaussian", x, mean_sig, sigma_sig)
# Background
slope_bkg = RooRealVar("slope_bkg", "Slope of Exponential", -0.1, -5, 0)
expo_bkg = ROOT.RooExponential("expo_bkg", "Background Exponential", x, slope_bkg)
nsig = RooRealVar("nsig", "Number of Signal Events", 500, 0, 1000)
nbkg = RooRealVar("nbkg", "Number of Background Events", 500, 0, 1000)
model = RooAddPdf("model", "Signal + Background", RooArgList(gauss_sig, expo_bkg), RooArgList(nsig, nbkg))
# Generate a toy dataset
data = model.generate(RooArgSet(x), 1000)
# Fit the model to the data
model.fitTo(data, RooFit.PrintLevel(-1)) # Suppress fit output for cleaner display
# sPlot to get sWeights
sData = RooStats.SPlot("sData", "sPlot", data, model, RooArgList(nsig, nbkg))
for i in range(10): # Print first 10 weights
sweight_signal = sData.GetSWeight(i, "nsig")
sweight_background = sData.GetSWeight(i, "nbkg")
print(f"Entry {i}: Signal sWeight = {sweight_signal}, Background sWeight = {sweight_background}")
# Up to now this is pretty standard RooFit code
# Create a new ROOT file to save results
tree = data.GetClonedTree()
rdf = ROOT.RDataFrame(tree)
# Declare `sData` globally for the JIT compiler
ROOT.gInterpreter.ProcessLine(f"RooStats::SPlot* global_sData = nullptr;")
ROOT.gInterpreter.ProcessLine(f"global_sData = (RooStats::SPlot*){ROOT.addressof(sData)};")
# Define lambda functions to retrieve sWeights
ROOT.gInterpreter.ProcessLine("""
auto getSignalSWeight = [](ULong64_t entry_number) -> Double_t {
return global_sData->GetSWeight(entry_number, "nsig");
};
""")
# Apply the lambda functions in RDataFrame to add sWeights as columns
df_with_weights = rdf.Define("sWeight_signal", "getSignalSWeight(rdfentry_)", ["rdfentry_"])
df_with_weights.Snapshot("tree_with_sWeights", f"file_{index}.root")
def main():
for i in range (3):
print(i)
fit(i) # segfault at 2nd iteration after Define()
if __name__ == "__main__":
main()