I am using HistFactory to run a fitter and I have changed my ROOT version from 6.24/08 (default on lxplus) to 6.26/08 (can be accessed with lb-conda Semilep/hammer/2022-10-28_13-29 bash).
I got the following error:
Error in TRint::HandleTermInput(): RooFit::BidirMMapPipe_impl::BidirMMapPipeException caught: xferraw: Broken pipe
I am running on lxplus and the error occurred in the middle of RooFit minimisation.
The zip file attached to this post contains the code for the fitter (Fitter.C) along with the files containing the histogram and data (respectively Hist_Signal_HistFactory.root and ToyData_Signal_MASTER.root), as well as the output with the error being at the very bottom (file called “Terminal Output”). Test.zip (2.0 MB)
Note that the histograms are all normalised to 1, with no empty bins (I set a value of 10^-10 for empty bins as a protection)
This crash happens in the code path of the NumCPU() option that you are using in createNLL().
Nothing changed there between 6.24/08 and 6.26/08, so I’m surprised about it and I have no Idea how it happens. With the new ROOT version, the problem seems to be gone. Have you tried 6.28/04? You can easily activate it on lxplus8 as described here: Release 62804 - ROOT.
Can you see if that works for you? By the way, in 6.28.04 there is now also the new BatchMode(true) option that you can use instead of NumCPU(). It will make the likelihood evaluation in the minimization much faster without using multiprocessing.
I think the problem doesn’t seem to be due to the NumCPU(). I’ve removed the NumCPU() from the Fitter.C script and the problem still occurs (the error message did not show up, but the program simply gets killed).
It seems to be a memory problem, so the problem show up sometimes and in other times the fit seem to survive without crashing, depending on if the machine can handle the large memory. However, with the “top” command and with only 1 node (since we don’t have NumCPU()), you can see that the %MEM increases monotonically (sometimes to 50%), which clearly shouldn’t happen. This doesn’t seem to happen with 6.24/08 but happen from 6.26/08.
I see, yes that makes sense. I thought it was because of NumCPU because the crash happens in RooFit::BidirMMapPipe, which is used when you set NumCPU(). But you’re right, since the problem comes from a memory leak that eventually crashes the process, it can also happen without multiprocessing.
Have you tried ROOT 6.28.04? We fixed tons of memory leaks in this release, it is quite likely that this problem got fixed already!
Thanks for the speedy reply! I haven’t tried ROOT 6.28.04 yet, because I would like to use the environment lb-conda Semilep/hammer/2022-10-28_13-29 bash. May I ask how do I use 6.28.04 with that environment?
The path can be found in the link above. I would only use the conda environment for this check with the new ROOT version if you need it to also get some other software dependencies that you otherwise don’t get.