RooFit: NumCPU crashes

Across many different known-working scripts with multiple fits per script with various arches (2.4/SL3, 2.6/SL4, ubuntu 2.6.28) and root versions (5.20, 5.22), I cannot get the NumCPU function of fitTo to work. It constantly crashes.

Although it seems under one rare case, it will work sucessfully if I only have one fit in my script. If I have multiple fits, it never seems to work. I invoke several fitTo calls with the NumCPU argument in the script, and this always crashes for me.

If I’m doing something wrong, please let me know. I’d like to speed up my fits with this feature if possible.

James

Hi James,

What p.d.f. are you fitting? I recently fixed some problems in RooFFTConvPdf.
These are now in the dev/roostats branch of RooFit and will be propagated
to the next 5.23.X release (this month). If you have other problems I’d like
to know about them so I can fix these too. Can you post (or mail me) an example
macro that reproduces the crash?

Wouter

Hello Wouter,

I sent you my script to your email (assuming you’re still at nikhef.nl).

Thank you for looking into this.
James

I just tried to use multi-CPU fitting and noticed the same problem. My (2D) PDF is the sum of several gaussians, each with means and sigmas that are RooFormulaVars (for now).

When using multiple CPUs, my program can typically perform one fit (which looks OK) before crashing shortly afterwards. Running the program through valgrind shows numerous ‘invalid reads’ and ‘control flow depends on uninitialized data’ errors and in this case not even a single fit completes, so I suspect you will be able to debug this problem easily by fitting any old PDF on multiple CPUs and running it through valgrind or whatever other bounds checker you have handy.

Hi,

Oops, completely forgot about this one! I’ll try to process this one right away so that any fix can still make it into ROOT 5.24 (due at the end of the month)

Wouter

Hi,

This one was quite hard and took me a full day to debug.

The SEGVs occur only if the likelihood being minimized has ‘issues’ (i.e. p.d.f.s becoming zero or negative during minimization) triggers code that transfers the list of error messages from one process to another, but even then only so if it happens more than once during a minimization session.

I have a fix for this now that I will propagate to 5.24

Wouter