Error when using HistFactory

I am using HistFactory to run a fitter and I have changed my ROOT version from 6.24/08 (default on lxplus) to 6.26/08 (can be accessed with lb-conda Semilep/hammer/2022-10-28_13-29 bash).

I got the following error:
Error in TRint::HandleTermInput(): RooFit::BidirMMapPipe_impl::BidirMMapPipeException caught: xferraw: Broken pipe

I am running on lxplus and the error occurred in the middle of RooFit minimisation.

The zip file attached to this post contains the code for the fitter (Fitter.C) along with the files containing the histogram and data (respectively Hist_Signal_HistFactory.root and ToyData_Signal_MASTER.root), as well as the output with the error being at the very bottom (file called “Terminal Output”).
Test.zip (2.0 MB)

Note that the histograms are all normalised to 1, with no empty bins (I set a value of 10^-10 for empty bins as a protection)

Hi @delick,

thank you for your question. Maybe @jonas could help here?

Cheers,
Marta

It seems to be a memory issue:

Using the “top” command, it can be seen that the memory usage continues to increase monotonically.

Reducing the memory artificially, such as with ulimit -v $((2x1024x1024)), causes the crash to appear earlier.

Hi @delick!

This crash happens in the code path of the NumCPU() option that you are using in createNLL().

Nothing changed there between 6.24/08 and 6.26/08, so I’m surprised about it and I have no Idea how it happens. With the new ROOT version, the problem seems to be gone. Have you tried 6.28/04? You can easily activate it on lxplus8 as described here: Release 62804 - ROOT.

Can you see if that works for you? By the way, in 6.28.04 there is now also the new BatchMode(true) option that you can use instead of NumCPU(). It will make the likelihood evaluation in the minimization much faster without using multiprocessing.

Cheers,
Jonas

Hi @jonas !

I think the problem doesn’t seem to be due to the NumCPU(). I’ve removed the NumCPU() from the Fitter.C script and the problem still occurs (the error message did not show up, but the program simply gets killed).

It seems to be a memory problem, so the problem show up sometimes and in other times the fit seem to survive without crashing, depending on if the machine can handle the large memory. However, with the “top” command and with only 1 node (since we don’t have NumCPU()), you can see that the %MEM increases monotonically (sometimes to 50%), which clearly shouldn’t happen. This doesn’t seem to happen with 6.24/08 but happen from 6.26/08.

I see, yes that makes sense. I thought it was because of NumCPU because the crash happens in RooFit::BidirMMapPipe, which is used when you set NumCPU(). But you’re right, since the problem comes from a memory leak that eventually crashes the process, it can also happen without multiprocessing.

Have you tried ROOT 6.28.04? We fixed tons of memory leaks in this release, it is quite likely that this problem got fixed already!

Thanks for the speedy reply! I haven’t tried ROOT 6.28.04 yet, because I would like to use the environment lb-conda Semilep/hammer/2022-10-28_13-29 bash. May I ask how do I use 6.28.04 with that environment?

I don’t know, this is something to ask your LHCb colleagues, if I assume correctly that the lb in lb-conda stands for LHCb?

But on lxplus with CentOS8 (aka lxplus8), you can easily activate ROOT 6.28.04 with minimal dependencies using:

source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.28.04/x86_64-centos8-gcc85-opt/bin/thisroot.sh

The path can be found in the link above. I would only use the conda environment for this check with the new ROOT version if you need it to also get some other software dependencies that you otherwise don’t get.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.