I’m using RooFit to fit a PDF from HistFactory. This worked with ROOT 6.22/08 but with 6.24/00 it appears to hang and memory use climbs (slowly but surely. I killed the process when it reached 7 GB).
f = ROOT.TFile.Open("wkspace_scalar_1100.root")
w = f.Get("combWS")
data = w.data("combData")
n_evts = int(data.sumEntries() + 1)
pdf = w.pdf("combPdf")
params = pdf.getParameters(data)
observables = data.get()
weight_var = ROOT.RooRealVar("_weight_", "_weight_", 1, 0., 1e7)
observables.add(weight_var)
print("\n[blue bold]Observables [/blue bold]")
print(
"\n".join([varToStr(o) for o in observables]),
)
data.Print()
print(f"\n[blue bold]Events per dataset: {n_evts}[/]")
print(f"[blue bold]Num entries: {data.numEntries()}[/]")
print(f"[blue bold]Expected events: {pdf.expectedEvents(observables)}[/]")
rp.Prompt.ask("Press Enter to continue")
print("\n")
print("[green bold]Running fit...[/]")
fit_start = perf_counter()
poi = params.selectByName("xsec_br").first()
poi.setVal(0.0)
poi.setConstant(True)
res = pdf.fitTo(
data,
ROOT.RooFit.Offset(True),
ROOT.RooFit.BatchMode(True),
ROOT.RooFit.Save(True),
)
print("[blue bold]Parameters [/blue bold]")
print(
"\n".join([varToStr(p) for p in params]),
)
print(f"Expected events: {pdf.expectedEvents(observables)}")
print(f"Took {perf_counter() - fit_start}s")
rp.Prompt.ask("Press Enter to continue")
print("[green bold]Generating toys...[/]")
w2 = ROOT.RooWorkspace("toys", "toys")
start = perf_counter()
spec = pdf.prepareMultiGen(
observables,
ROOT.RooFit.Name("toyData"),
ROOT.RooFit.NumEvents(n_evts),
ROOT.RooFit.Verbose(True),
ROOT.RooFit.Extended(),
ROOT.RooFit.AllBinned()
)
If we are lucky you might be able to pinpoint the exact line at which the leak happens by running the reproducer under valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp --track-origins=yes <reproducer command> and/or valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp --tool=massif <reproducer command>. You will need a build of ROOT that contains debug symbols, I think LXPLUS has some.
Otherwise, without a reproducer it’s hard to make progress.
Anyhow, I was able to run it under valgrind. It looks to me like it’s an std::map that’s causing the leak.
The Massif snapshots are attached. massif2.vgdb.tar.gz (72.8 KB)
Can you please try with a build of ROOT with debug symbols, so we get line numbers in the valgrind logs? Also valgrind memcheck (i.e. not massif) should point exactly to what leaks if it sees something, which might be a more direct clue than massif (which simply lists all allocations).
If you cannot upload the file, can you post the code reproducing the problem with the histfactory tutorial generated model, e.g. one of the results/example_combined_XXX_model.root
Here’s the example file and code. This model is small enough that the fit finishes almost instantly, which makes it hard to verify if the growth in memory use occurs here.
Ironically I’m doing this fit in order to generate a set of toys with the fit model. That said, the hang and increasing memory usage do occur in the fitting step, and I’m not using the management classes that FrequentistCalculator uses.
Thank for posting the code. I think this is caused mainly by Memory leak when using MemPoolForRooSets · Issue #7933 · root-project/root · GitHub , whic is being investigated.
Although that issue is present since 6.14, and you are saying that you did not have problems in 6.22.
I cannot run your code in 6.22, I would need probably a root file generated for 6.22. It would be good if you can share the code for 6.22,
thanks
Lorenzo