I’m running a very complex fit in RooFit, but after some (a lot of) time running (more or less the same time in the few tries), the fit is crashing giving
In standby(/build/jenkins/workspace/lcg_nightly_pipeline/build/projects/ROOT-v6-26-00-patches/src/ROOT/v6-26-00-patches/roofit/roofitcore/src/RooRealMPFE.cxx, 633): Server shutdown failed.
and then throwing an exception (with no message apparently. Also, I cannot find where the exception comes from).
What does this means precisely?
Also, when is
RooRealMPFE::standby called? Is this an error at the end of the minimization, when all the worker processes are shutting down and somehow fail to do so?
Thank you in advance
RooRealMPFE::standby() is called in the destructor, so probably yes, it is at the end of the minimization.
Sorry I don’t know much about the
RooRealMPFE class. It is part of the old RooFit multiprocessing framework where the code is not actively worked on anymore. As you now, nowadays I mostly try to make RooFit faster on a single thread with things like the BatchMode. I hope with ROOT 6.28, you can also use that and don’t have to rely on mutiprocessing to get acceptable speed.
Anyway, I would like to help you with this problem, but I’m afraid without being able to reproduce the problem myself I can’t help much. If you can share the code to reproduce this either here or in private, that would be nice.
I cannot share the required data but I’ll try generating an equivalent MC sample.
I do not think I will be able to use BatchMode for this analysis, last time I tried there were still issues in the master, which I could not debug in time.
While retrying the fit with different parameters I also noticed that there is a serious memory leak, so what may be happening is that when the memory is exceeded everything crashes.
Any suggestion to debug that?
Ok, I did not manage to make an MC sample in time (chasing other bugs), but apparently this problem was due to a
RooAddPdf large enough to annoy RooFit.
Replacing it with a custom class seems to have solved the problem