I’m running a very complex fit in RooFit, but after some (a lot of) time running (more or less the same time in the few tries), the fit is crashing giving
In standby(/build/jenkins/workspace/lcg_nightly_pipeline/build/projects/ROOT-v6-26-00-patches/src/ROOT/v6-26-00-patches/roofit/roofitcore/src/RooRealMPFE.cxx, 633): Server shutdown failed.
and then throwing an exception (with no message apparently. Also, I cannot find where the exception comes from).
What does this means precisely?
Also, when is RooRealMPFE::standby called? Is this an error at the end of the minimization, when all the worker processes are shutting down and somehow fail to do so?
I think RooRealMPFE::standby() is called in the destructor, so probably yes, it is at the end of the minimization.
Sorry I don’t know much about the RooRealMPFE class. It is part of the old RooFit multiprocessing framework where the code is not actively worked on anymore. As you now, nowadays I mostly try to make RooFit faster on a single thread with things like the BatchMode. I hope with ROOT 6.28, you can also use that and don’t have to rely on mutiprocessing to get acceptable speed.
Anyway, I would like to help you with this problem, but I’m afraid without being able to reproduce the problem myself I can’t help much. If you can share the code to reproduce this either here or in private, that would be nice.
I cannot share the required data but I’ll try generating an equivalent MC sample.
I do not think I will be able to use BatchMode for this analysis, last time I tried there were still issues in the master, which I could not debug in time.
While retrying the fit with different parameters I also noticed that there is a serious memory leak, so what may be happening is that when the memory is exceeded everything crashes.
Any suggestion to debug that?
Ok, I did not manage to make an MC sample in time (chasing other bugs), but apparently this problem was due to a RooAddPdf large enough to annoy RooFit.
Replacing it with a custom class seems to have solved the problem