Hello!
I am trying to write a multithreaded data analysis tool using ROOT. I am at a point where everything works fine when running as a single thread, but things break down when there are two or more threads running in parallel.
Each thread reads and writes to different ROOT files. So in principle, each thread should not interfere with each other. This doesn’t seem the case because the application crashes every time.
I was not able to single out the section in the code responsible for the crash but I have written a minimal example that reproduces the issue (see post 3)
What I am asking here is some feedback about how could I troubleshoot the problem and what is the best way to deal with my issue.
What I am trying to do:
First of all, I enable multithread in ROOT using the ROOT::EnableThreadSafety() method.
Then each thread opens a different ROOT TFile files containing many TH1I histograms. The name of the histograms are the same across all the files (this may be the root of the issue)
Then I get the histograms pointers using the Get method of TFile class.
Then I fit the histograms using the Fit method of the TH1I class.
To fit the histograms I use TF1 objects.
Finally I close all the files.
I have already tried moving the histograms to gDirectory = 0 (globally AND/OR one by one) without any improvement.
but you are reading the histos from the file, instead of creating them and filling them with the contents of the tree.
Could you try to add the line that appears in the test above:
// Don't link histos to a particular TDirectory
TH1::AddDirectory(false);
but I fear the problem is related to that: you get a crash when the histograms are destroyed in a multi-threaded environment, since they belong to the files.
Thank you very much for your prompt reply: I will try your fix.
In the meanwhile, I was able to reproduce the issue in a minimal example. Notice that I use Ctypes and a python script to spawn threads. This is not casual: I use the very same architecture in my application because of our particular software framework.
test.zip (2.0 KB)
You can run the example simply by:
make
python
>>> import test
>>> test()
The code structure mimics the one of the actual program. Of course it does not make sense for such a simple script … but it is just to reproduce my issue as accurately as I can.
I tried to add TH1::AddDirectory(false); just after ROOT::EnableThreadSafety(); but the application is still crashing (with a different but still cryptic segfault message)
you get a crash when the histograms are destroyed in a multi-threaded environment since they belong to the files.
I think you are on the right track because most of the times I get a “free invalid pointer” kind of error.
Ok, let’s try something in order to know if the histograms are the issue here: from your code, can you keep the opening of the file but comment out the getting of the histograms (and the fit, consequently)? So every thread opens a file and creates a TF1, but no histograms involved.
Do you still see any error? Can you share here the stack trace you get?
Thank you for the hint. I have tried to comment out the getting and fitting of the histograms and the program does not crash. If I reintroduce the getting of the histograms the program does not crash as well. If I try to fit the histograms
test_hist->Fit(gaussian);
I get a segmentation fault (without any stack trace)
*** Break *** segmentation violation
*** Break *** segmentation violation
Process Python exited abnormally with code 139
If I don’t free the TF1 object by commenting out the line
delete gaussian;
I get a crash with the following stack trace: stack trace.txt (21.5 KB)
The problem seems to be the fit itself. But the reason of the crash is still beyond me …
I think TMInuit is not thread safe. You should use instead Minuit2. Build ROOT with Minuit2 support (-Dminuit2=On) and add this line in your program: ROOT::Math::MinimizerOptions::SetDefaultMinimizer("Minuit2");
Then in principle the fitting should be protected by locks and should work in multi-threads,
Dear Lorenzo,
thank you for the quick reply. I did as you suggested and I compiled ROOT 6.18.00 with Minuit2 support. In a single thread application the fit is correctly using Minuit2 and this is the output of my test program
>>> test()
Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 68.4773
NDf = 97
Edm = 4.6045e-09
NCalls = 63
p0 = 601.014 +/- 4.73414
p1 = 0.0001709 +/- 0.00637732
p2 = 0.992809 +/- 0.00475785
Peak = 601.014
Mean = 0.0001709
Sigma = 0.992809
but if I try to create more than one thread the Minuit2 minimizer is not used anymore and the fitter falls back to Minuit.
>>> test()
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minuit2 minimizer. Try using the minimizer Minuit
Error in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minuit2 minimizer
Error in <ROOT::Math::Fitter::FitFCN>: Minimizer cannot be created
Warning in <Fit>: Abnormal termination of minimization.
fatal error: malformed or corrupted AST file: 'AST record has invalid code'
terminate called after throwing an instance of 'std::runtime_error'
what(): >>> Interpreter compilation error:
Invalid abbrev number
Process Python terminated (core dumped)
I have tried to load libMinuit2 manually as suggested in one of the thread that you linked
@etejedor@moneta
Do you think this misbehavior is a ROOT bug or I am missing something?
In case you think it is a ROOT bug, do you want me to write a smaller/simpler example and open a bug report on the ROOT bug tracker?
It may be that the 7173 bug is not completely fixed or not fixed for all the possible cases. By the way, the macro provided in the description of that bug report is crashing in my ROOT.
PS for the time being I am getting by putting a mutex on the section where the fitting happens so that only one thread access the Minuit2 fitter at a time.
Hi Lorenzo,
I think that the crashes that I was experiencing until now were caused by something external to ROOT (maybe something wrong with my system).
As a matter of fact, today I tried to run your bug_7173.C again and I got the very same behavior as you describe (Minuit2 works fine and TMinuit sometimes crashes). Then I tried to run my sample code and my original application and they run fine too (when using Minuit2).
I feel a little embarrassed because I have no idea what was interfering with ROOT until now and what has changed since yesterday. If and when I have some clue, I will let you know.
Many thanks to you and etejedor for your help and patience.
Grazie ancora e saluti dal Giappone