Crash when generating many RooDataSets

I get the crash below when I try to generate 579000 RooDataSets in a loop, even though I am deleting them properly I think. Here is the macro I am using

[code]#include
#include “RooRealVar.h”
#include “RooDataSet.h”
#include “RooExponential.h”

void cpp_version(){

RooRealVar x1(“x1”,“x1”,1,0,10);
RooRealVar c1(“c1”,“c1”,-1,-1,-1);
RooExponential rooexp1(“rooexp1”,“rooexp1”,x1,c1);

for (int i = 0; i < 1000000; i++){

if (i % 1000 == 0)
  std::cout << "i = " << i << std::endl;

RooDataSet *  data1=rooexp1.generate(RooArgSet(x1),1);

delete data1;

}

}[/code]

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHcub >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Error in <TFoamCell::GetHSize >: Something wrong with linked tree 

Info in <TFoam::CheckAll>: Check - found total 15  errors 

Error in <TRefArray::AddAtAndExpand>: The object at 0x12ed0840 is not registered in the process the TRefArray points to (pid = ProcessID0/824ed4f6-7cce-11e4-9717-cf658a89beef)
Error in <TRefArray::AddAtAndExpand>: The object at 0x12ed08e0 is not registered in the process the TRefArray points to (pid = ProcessID0/824ed4f6-7cce-11e4-9717-cf658a89beef)
Error in <TRefArray::AddAtAndExpand>: The object at 0x12ef7970 is not registered in the process the TRefArray points to (pid = ProcessID0/824ed4f6-7cce-11e4-9717-cf658a89beef)
Error in <TRefArray::AddAtAndExpand>: The object at 0x12ef4db0 is not registered in the process the TRefArray points to (pid = ProcessID0/824ed4f6-7cce-11e4-9717-cf658a89beef)
Error in <TRefArray::AddAtAndExpand>: The object at 0x12efdab0 is not registered in the process the TRefArray points to (pid = ProcessID0/824ed4f6-7cce-11e4-9717-cf658a89beef)
Error in <TRefArray::AddAtAndExpand>: The object at 0x12efdb50 is not registered in the process the TRefArray points to (pid = ProcessID0/824ed4f6-7cce-11e4-9717-cf658a89beef)

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x0000003938099dd5 in waitpid () from /lib64/libc.so.6
#1  0x000000393803c4a1 in do_system () from /lib64/libc.so.6
#2  0x000000393ae79247 in TUnixSystem::StackTrace() ()
   from /usr/lib64/root/libCore.so.5.34
#3  0x000000393ae7608a in TUnixSystem::DispatchSignals(ESignals) ()
   from /usr/lib64/root/libCore.so.5.34
#4  <signal handler called>
#5  0x00002b7a292cef62 in TFoam::MakeActiveList() ()
   from /usr/lib64/root/libFoam.so
#6  0x00002b7a292ce22e in TFoam::Initialize() ()
   from /usr/lib64/root/libFoam.so
#7  0x00002b7a2ada59af in RooFoamGenerator::RooFoamGenerator(RooAbsReal const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal const*) ()
   from /usr/lib64/root/libRooFitCore.so
#8  0x00002b7a2ada5c10 in RooFoamGenerator::clone(RooAbsReal const&, RooArgSet const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal const*) const
    () from /usr/lib64/root/libRooFitCore.so
#9  0x00002b7a2ae0f7d1 in RooNumGenFactory::createSampler(RooAbsReal&, RooArgSet const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal*) ()
   from /usr/lib64/root/libRooFitCore.so
#10 0x00002b7a2adafa5d in RooGenContext::RooGenContext(RooAbsPdf const&, RooArgSet const&, RooDataSet const*, RooArgSet const*, bool, RooArgSet const*) ()
   from /usr/lib64/root/libRooFitCore.so
#11 0x00002b7a2ace2c2e in RooAbsPdf::genContext(RooArgSet const&, RooDataSet const*, RooArgSet const*, bool) const () from /usr/lib64/root/libRooFitCore.so
#12 0x00002b7a2ace7d30 in RooAbsPdf::autoGenContext(RooArgSet const&, RooDataSet const*, RooArgSet const*, bool, bool, char const*) const ()
   from /usr/lib64/root/libRooFitCore.so
#13 0x00002b7a2ace296f in RooAbsPdf::generate(RooArgSet const&, double, bool, bool, char const*, bool, bool) const () from /usr/lib64/root/libRooFitCore.so
#14 0x00002b7a2c5e7a5b in cpp_version ()
    at /home/anlevin/anlevin-8p591/pset10/./cpp_version.C:17
#15 0x00002b7a2c5e7b89 in G__cpp_version_C_ACLiC_dict__0_3850 (
    result7=0x12f041a0, funcname=0x0, libp=0x0, hash=10)
    at /home/anlevin/anlevin-8p591/pset10/cpp_version_C_ACLiC_dict.cxx:82
#16 0x000000393985682d in Cint::G__ExceptionWrapper(int (*)(G__value*, char const*, G__param*, int), G__value*, char*, G__param*, int) ()
   from /usr/lib64/root/libCint.so.5.34
#17 0x000000393990dc60 in G__execute_call ()
   from /usr/lib64/root/libCint.so.5.34
#18 0x000000393990f73b in G__call_cppfunc ()
   from /usr/lib64/root/libCint.so.5.34
#19 0x00000039398eab0a in G__interpret_func ()
   from /usr/lib64/root/libCint.so.5.34
#20 0x00000039398da108 in G__getfunction ()
   from /usr/lib64/root/libCint.so.5.34
#21 0x00000039398ae13d in G__getitem () from /usr/lib64/root/libCint.so.5.34
#22 0x00000039398b8761 in G__getexpr () from /usr/lib64/root/libCint.so.5.34
#23 0x00000039398bfb9d in G__calc_internal ()
   from /usr/lib64/root/libCint.so.5.34
#24 0x000000393994be71 in G__process_cmd ()
   from /usr/lib64/root/libCint.so.5.34
#25 0x000000393ae2f7dd in TCint::ProcessLine(char const*, TInterpreter::EErrorCode*) () from /usr/lib64/root/libCore.so.5.34
#26 0x000000393ae2f2b3 in TCint::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) () from /usr/lib64/root/libCore.so.5.34
#27 0x000000393ad8434e in TApplication::ExecuteFile(char const*, int*, bool) ()
   from /usr/lib64/root/libCore.so.5.34
#28 0x000000393ad837c8 in TApplication::ProcessLine(char const*, bool, int*) ()
   from /usr/lib64/root/libCore.so.5.34
#29 0x000000393a212811 in TRint::Run(bool) ()
   from /usr/lib64/root/libRint.so.5.34
#30 0x0000000000400fdd in main ()
===========================================================


The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00002b7a292cef62 in TFoam::MakeActiveList() ()
   from /usr/lib64/root/libFoam.so
#6  0x00002b7a292ce22e in TFoam::Initialize() ()
   from /usr/lib64/root/libFoam.so
#7  0x00002b7a2ada59af in RooFoamGenerator::RooFoamGenerator(RooAbsReal const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal const*) ()
   from /usr/lib64/root/libRooFitCore.so
#8  0x00002b7a2ada5c10 in RooFoamGenerator::clone(RooAbsReal const&, RooArgSet const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal const*) const
    () from /usr/lib64/root/libRooFitCore.so
#9  0x00002b7a2ae0f7d1 in RooNumGenFactory::createSampler(RooAbsReal&, RooArgSet const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal*) ()
   from /usr/lib64/root/libRooFitCore.so
#10 0x00002b7a2adafa5d in RooGenContext::RooGenContext(RooAbsPdf const&, RooArgSet const&, RooDataSet const*, RooArgSet const*, bool, RooArgSet const*) ()
   from /usr/lib64/root/libRooFitCore.so
#11 0x00002b7a2ace2c2e in RooAbsPdf::genContext(RooArgSet const&, RooDataSet const*, RooArgSet const*, bool) const () from /usr/lib64/root/libRooFitCore.so
#12 0x00002b7a2ace7d30 in RooAbsPdf::autoGenContext(RooArgSet const&, RooDataSet const*, RooArgSet const*, bool, bool, char const*) const ()
   from /usr/lib64/root/libRooFitCore.so
#13 0x00002b7a2ace296f in RooAbsPdf::generate(RooArgSet const&, double, bool, bool, char const*, bool, bool) const () from /usr/lib64/root/libRooFitCore.so
#14 0x00002b7a2c5e7a5b in cpp_version ()
    at /home/anlevin/anlevin-8p591/pset10/./cpp_version.C:17
===========================================================

do you still have this problem ?

Dear all

I also meet exactly the same problem recently, when I try to generate many dataset. I am using ROOT version 6.02.12-x86_64-slc6-gcc48-opt

Could anyone have an idea ?

Thanks!
Javier

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Info in TFoam::CheckAll: Check - found total 15 errors

Error in TRefArray::AddAtAndExpand: The object at 0x543a820 is not registered in the process the TRefArray points to (pid = ProcessID0/e850f18e-5988-11e5-9717-104da983beef)
Error in TRefArray::AddAtAndExpand: The object at 0x452f9c0 is not registered in the process the TRefArray points to (pid = ProcessID0/e850f18e-5988-11e5-9717-104da983beef)
Error in TRefArray::AddAtAndExpand: The object at 0x524db30 is not registered in the process the TRefArray points to (pid = ProcessID0/e850f18e-5988-11e5-9717-104da983beef)
Error in TRefArray::AddAtAndExpand: The object at 0x456e5e0 is not registered in the process the TRefArray points to (pid = ProcessID0/e850f18e-5988-11e5-9717-104da983beef)
Error in TRefArray::AddAtAndExpand: The object at 0x4d6c910 is not registered in the process the TRefArray points to (pid = ProcessID0/e850f18e-5988-11e5-9717-104da983beef)
Error in TRefArray::AddAtAndExpand: The object at 0x53a15a0 is not registered in the process the TRefArray points to (pid = ProcessID0/e850f18e-5988-11e5-9717-104da983beef)

*** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:

#0 0x0000003128eac61e in waitpid () from /lib64/libc.so.6
#1 0x0000003128e3e609 in do_system () from /lib64/libc.so.6
#2 0x00002b64e6196b1f in TUnixSystem::StackTrace() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/6.02.12-x86_64-slc6-gcc48-opt/lib/libCore.so
#3 0x00002b64e619869c in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/6.02.12-x86_64-slc6-gcc48-opt/lib/libCore.so
#4
#5 0x00002b64f2bba2c8 in TFoam::MakeActiveList() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/6.02.12-x86_64-slc6-gcc48-opt/lib/libFoam.so
#6 0x00002b64f2bbe63f in TFoam::Initialize() () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/6.02.12-x86_64-slc6-gcc48-opt/lib/libFoam.so
#7 0x00002b64f16a969e in RooFoamGenerator::RooFoamGenerator(RooAbsReal const&, RooArgSet const&, RooNumGenConfig const&, bool, RooAbsReal const*) () from /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/root/6.02.12-x86_64-slc6-gcc48-opt/lib/libRooFitCore.so

I can confirm that I have the same issues with RooFit if I try to generate a lot of datasets

Hi,

Can you please post a macro reproducing the problem ? This error in Foam is obtained who one try to generates with a ill-formed pdf (e.g. having for example negative values), so it could be caused by many different things.

Lorenzo

I added a minimal binary to reproduce the crash. Compile using something like this:

g++ -o CrashTest root-config --cflags --libs -lRooFitCore CrashTest.cc
CrashTest.cc (1.85 KB)

I’m currently trying this hack in my code and it looks like it goes further:

saveNumber = TProcessID::GetObjectCount();

TProcessID::SetObjectCount(savedNumber);

Update: Didn’t work, died as well

I am running your program, seems to be working fine. After how many events is it crashing ?

Lorenzo

After something like 2.3 million events (ROOT 5.34.23)?

Rerun on my Mac (OS X El Capitan, ROOT 6.04.06 via Homebrew). Output below:

./CrashTest

RooFit v3.60 – Developed by Wouter Verkerke and David Kirkby
Copyright © 2000-2013 NIKHEF, University of California & Stanford University
All rights reserved, please read roofit.sourceforge.net/license.txt

10000 of 1e+09
20000 of 1e+09
30000 of 1e+09
40000 of 1e+09
50000 of 1e+09
60000 of 1e+09
70000 of 1e+09
80000 of 1e+09
90000 of 1e+09
100000 of 1e+09
110000 of 1e+09
120000 of 1e+09
130000 of 1e+09
140000 of 1e+09
150000 of 1e+09
160000 of 1e+09
170000 of 1e+09
180000 of 1e+09
190000 of 1e+09
200000 of 1e+09
210000 of 1e+09
220000 of 1e+09
230000 of 1e+09
240000 of 1e+09
250000 of 1e+09
260000 of 1e+09
270000 of 1e+09
280000 of 1e+09
290000 of 1e+09
300000 of 1e+09
310000 of 1e+09
320000 of 1e+09
330000 of 1e+09
340000 of 1e+09
350000 of 1e+09
360000 of 1e+09
370000 of 1e+09
380000 of 1e+09
390000 of 1e+09
400000 of 1e+09
410000 of 1e+09
420000 of 1e+09
430000 of 1e+09
440000 of 1e+09
450000 of 1e+09
460000 of 1e+09
470000 of 1e+09
480000 of 1e+09
490000 of 1e+09
500000 of 1e+09
510000 of 1e+09
520000 of 1e+09
530000 of 1e+09
540000 of 1e+09
550000 of 1e+09
560000 of 1e+09
570000 of 1e+09
Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHcub >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Error in <TFoamCell::GetHSize >: Something wrong with linked tree

Info in TFoam::CheckAll: Check - found total 15 errors

Error in TRefArray::AddAtAndExpand: The object at 0x7fd9d0fcb720 is not registered in the process the TRefArray points to (pid = ProcessID0/8566ea7a-824a-11e5-9717-0202a8c0beef)
Error in TRefArray::AddAtAndExpand: The object at 0x7fd9d0fcb7c0 is not registered in the process the TRefArray points to (pid = ProcessID0/8566ea7a-824a-11e5-9717-0202a8c0beef)
Error in TRefArray::AddAtAndExpand: The object at 0x7fd9d0fcb900 is not registered in the process the TRefArray points to (pid = ProcessID0/8566ea7a-824a-11e5-9717-0202a8c0beef)
Error in TRefArray::AddAtAndExpand: The object at 0x7fd9d0fcb9a0 is not registered in the process the TRefArray points to (pid = ProcessID0/8566ea7a-824a-11e5-9717-0202a8c0beef)
Error in TRefArray::AddAtAndExpand: The object at 0x7fd9d0fca300 is not registered in the process the TRefArray points to (pid = ProcessID0/8566ea7a-824a-11e5-9717-0202a8c0beef)
Error in TRefArray::AddAtAndExpand: The object at 0x7fd9d0fca3a0 is not registered in the process the TRefArray points to (pid = ProcessID0/8566ea7a-824a-11e5-9717-0202a8c0beef)

*** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:

Thread 1 (Thread 0x1503 of process 65267):
#0 0x00007fff83847742 in wait4 () from /usr/lib/system/libsystem_kernel.dylib
#1 0x00007fff98c1fd3b in system () from /usr/lib/system/libsystem_c.dylib
#2 0x0000000109726a19 in TUnixSystem::StackTrace() () from /usr/local/opt/root6/lib/root/libCore.so
#3 0x0000000109729ac5 in TUnixSystem::DispatchSignals(ESignals) () from /usr/local/opt/root6/lib/root/libCore.so

Any idea on what I can do to get around that? I looked at the HistPdf and the DataHist object but there is no way of changing the underlying data conveniently so I don’t know how to get around recreating these objects all the time

It is difficult to reproduce this problem. It could be caused by some rare and bad generated values.
Have you tried changing for example the random seed ?
Do you still have the crash ?

Lorenzo

Hi,

I observe exactly the same behaviour (ROOT 5.34.34) when generating more than 578000 datasets. It is independent of the total number of generated events, which differed in my tests by factor 10 (between 60M and 500M). Deleting and recreating the Roo…Pdf inbetween the iterations didn’t help.

The example macro posted by Andrew1 perfectly demonstrates the problem. I don’t believe that this is due to some random seed, since in my completely different context is crashes after the same number of generated datasets.

Best regards and thanks for any help on this issue,
Klaus

Hi,
Was this problem ever understood? I am encountering it myself at the moment with 6.04.18, after a great number of dataset generations this error appears and crashes my program.

Thanks
Will

Disclaimer: I am not an expert concerning RooFit (even had to enable it and compile it before testing the code above).

The program listed in the first post still crashes after 578524 iterations. I’ve tried to debug this a little bit and ran it with the address sanitizer and leak sanitizer as I noticed increasing memory consumption during execution. So in addition to the crash, there is a memory leak.

My output looks like the following (piped through uniq -c as there’s added debug info, see below):

~/test % ASAN_OPTIONS=symbolize=1:detect_leaks=1 ./a.out | uniq -c                                                                                               
      1 
      1 RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
      1                 Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
      1                 All rights reserved, please read http://roofit.sourceforge.net/license.txt
      1 
      1 i = 0
  15000 2t7t
      1 i = 1000
  15000 2t7t
      1 i = 2000
(...)
  15000 2t7t
      1 i = 577000
  15000 2t7t
      1 i = 578000
Error in <TRefArray::AddAtAndExpand>: The object at 0x60e0005fb600 is not registered in the process the TRefArray points to (pid = ProcessID0/9f9f2e7a-bbff-11e7-beb2-0101007fbeef)
Error in <TRefArray::AddAtAndExpand>: The object at 0x60e0005fb6e0 is not registered in the process the TRefArray points to (pid = ProcessID0/9f9f2e7a-bbff-11e7-beb2-0101007fbeef)
(...)
Error in <TRefArray::AddAtAndExpand>: The object at 0x60e0005fbde0 is not registered in the process the TRefArray points to (pid = ProcessID0/9f9f2e7a-bbff-11e7-beb2-0101007fbeef)

 *** Break *** segmentation violation
   7860 2t7t
      1 2f3t7t
      4 2t7t
     10 2f7f



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f699597e07a in __GI___waitpid (pid=25737, stat_loc=stat_loc
entry=0x7fff3ff05240, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00007f69958f6fbb in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f699afbcf0d in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x617000000080) at /home/wbehrenh/src/root/root-head/core/unix/src/TUnixSystem.cxx:2118
#3  TUnixSystem::StackTrace (this=0x617000000080) at /home/wbehrenh/src/root/root-head/core/unix/src/TUnixSystem.cxx:2412
#4  0x00007f699afbf4fc in TUnixSystem::DispatchSignals (this=0x617000000080, sig=kSigSegmentationViolation) at /home/wbehrenh/src/root/root-head/core/unix/src/TUnixSystem.cxx:3643
#5  <signal handler called>
#6  TFoam::MakeActiveList (this=0x6130000217c0) at /home/wbehrenh/src/root/root-head/math/foam/src/TFoam.cxx:1025
#7  0x00007f69941fdeda in TFoam::Initialize (this=0x6130000217c0) at /home/wbehrenh/src/root/root-head/math/foam/src/TFoam.cxx:449
#8  0x00007f699712f8af in RooFoamGenerator::RooFoamGenerator (this=0x615000a1e080, func=..., genVars=..., config=..., verbose=<optimized out>, maxFuncVal=<optimized out>) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooFoamGenerator.cxx:99
#9  0x00007f6997130031 in RooFoamGenerator::clone (this=<optimized out>, func=..., genVars=..., config=..., verbose=<optimized out>, maxFuncVal=0x0) at /home/wbehrenh/src/root/root-head-build/include/RooFoamGenerator.h:37
#10 0x00007f699725243a in RooNumGenFactory::createSampler (this=0x6060001de040, func=..., genVars=..., condVars=..., config=..., verbose=verbose
entry=false, maxFuncVal=0x0) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooNumGenFactory.cxx:227
#11 0x00007f69971af93f in RooGenContext::RooGenContext (this=0x619001eefb80, model=..., vars=..., prototype=<optimized out>, auxProto=<optimized out>, verbose=<optimized out>, forceDirect=0x0) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooGenContext.cxx:301
#12 0x00007f69970c6449 in RooAbsPdf::genContext (this=0x7fff3ff08da0, vars=..., prototype=0x0, auxProto=0x0, verbose=<optimized out>) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1641
#13 0x00007f69970c8535 in RooAbsPdf::autoGenContext (this=0x7fff3ff08da0, vars=..., prototype=<optimized out>, auxProto=<optimized out>, verbose=<optimized out>, autoBinned=<optimized out>, binnedTag=<optimized out>) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1658
#14 0x00007f69970c8077 in RooAbsPdf::generate (this=0x7fff3ff08da0, whatVars=..., nEvents=1, verbose=<optimized out>, autoBinned=<optimized out>, binnedTag=0x52d5a0 <.str> "", expectedData=<optimized out>, extended=false) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1966
#15 0x0000000000516369 in cpp_version() ()
#16 0x000000000051656e in main ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum.
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  TFoam::MakeActiveList (this=0x6130000217c0) at /home/wbehrenh/src/root/root-head/math/foam/src/TFoam.cxx:1025
#7  0x00007f69941fdeda in TFoam::Initialize (this=0x6130000217c0) at /home/wbehrenh/src/root/root-head/math/foam/src/TFoam.cxx:449
#8  0x00007f699712f8af in RooFoamGenerator::RooFoamGenerator (this=0x615000a1e080, func=..., genVars=..., config=..., verbose=<optimized out>, maxFuncVal=<optimized out>) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooFoamGenerator.cxx:99
#9  0x00007f6997130031 in RooFoamGenerator::clone (this=<optimized out>, func=..., genVars=..., config=..., verbose=<optimized out>, maxFuncVal=0x0) at /home/wbehrenh/src/root/root-head-build/include/RooFoamGenerator.h:37
#10 0x00007f699725243a in RooNumGenFactory::createSampler (this=0x6060001de040, func=..., genVars=..., condVars=..., config=..., verbose=verbose
entry=false, maxFuncVal=0x0) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooNumGenFactory.cxx:227
#11 0x00007f69971af93f in RooGenContext::RooGenContext (this=0x619001eefb80, model=..., vars=..., prototype=<optimized out>, auxProto=<optimized out>, verbose=<optimized out>, forceDirect=0x0) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooGenContext.cxx:301
#12 0x00007f69970c6449 in RooAbsPdf::genContext (this=0x7fff3ff08da0, vars=..., prototype=0x0, auxProto=0x0, verbose=<optimized out>) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1641
#13 0x00007f69970c8535 in RooAbsPdf::autoGenContext (this=0x7fff3ff08da0, vars=..., prototype=<optimized out>, auxProto=<optimized out>, verbose=<optimized out>, autoBinned=<optimized out>, binnedTag=<optimized out>) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1658
#14 0x00007f69970c8077 in RooAbsPdf::generate (this=0x7fff3ff08da0, whatVars=..., nEvents=1, verbose=<optimized out>, autoBinned=<optimized out>, binnedTag=0x52d5a0 <.str> "", expectedData=<optimized out>, extended=false) at /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1966
#15 0x0000000000516369 in cpp_version() ()
#16 0x000000000051656e in main ()
===========================================================



=================================================================
==23700==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 3145728 byte(s) in 3 object(s) allocated from:
    #0 0x4db768 in __interceptor_malloc (/home/wbehrenh/test/a.out+0x4db768)
    #1 0x7f69971b96ef in RooDataSet::operator new(unsigned long) /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooDataSet.cxx:135
    #2 0x7f699724d0d5 in RooAbsGenContext::createDataSet(char const*, char const*, RooArgSet const&) /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsGenContext.cxx:135
    #3 0x7f699724deb8 in RooAbsGenContext::generate(double, bool, bool) /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsGenContext.cxx:217
    #4 0x7f69970c80bb in RooAbsPdf::generate(RooArgSet const&, double, bool, bool, char const*, bool, bool) const /home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:1973
    #5 0x516368 in cpp_version() (/home/wbehrenh/test/a.out+0x516368)
    #6 0x51656d in main (/home/wbehrenh/test/a.out+0x51656d)
    #7 0x7f69958d282f in __libc_start_main /build/glibc-bfm8X4/glibc-2.23/csu/../csu/libc-start.c:291

Direct leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x511f88 in operator new[](unsigned long) (/home/wbehrenh/test/a.out+0x511f88)
    #1 0x7f699aef4e90 in TString::Init(int, int) /home/wbehrenh/src/root/root-head/core/base/src/TString.cxx:243

SUMMARY: AddressSanitizer: 3145792 byte(s) leaked in 5 allocation(s).
ASAN_OPTIONS=symbolize=1:detect_leaks=1 ./a.out  3952,06s user 3,77s system 99% cpu 1:05:59,13 total

The output 7860 2t7t, 1 2f3t7t, 4 2t7t is debug info I added in Bool_t TRefArray::GetObjectUID (see below). The output here is piped thru uniq -c. While the code path is 2t7t before the bug, it changes when program crashes:
7860 2t7t
1 2f3t7t
4 2t7t
10 2f7f

See the code path here (look for the cout’s to decrypt 2t7t, 2f3t7t, 2t7t, and 2f7f):

Bool_t TRefArray::GetObjectUID(Int_t &uid, TObject *obj, const char *methodname)
{
   // Check if the object can belong here.
   Bool_t valid = kTRUE;
   if (obj->TestBit(kHasUUID)) {
      valid = kFALSE;
      std::cout <<"1f";
   } else if (obj->TestBit(kIsReferenced)) {
      valid = (fPID == TProcessID::GetProcessWithUID(obj));
      std::cout << (valid ? "2t" : "2f");
      if (valid) {
         uid = obj->GetUniqueID();
      } else {
         if (GetAbsLast() < 0) {
            // The container is empty, we can switch the ProcessID.
            fPID = TProcessID::GetProcessWithUID(obj);
            valid = kTRUE;
            std::cout << "3t";
            if (gDebug > 3)
               Info(TString::Format("TRefArray::%s",methodname),"The ProcessID for the %p has been switched to %s/%s:%d.",
                    this,fPID->GetName(),fPID->GetTitle(),fPID->GetUniqueID());
        }
      }
   } else {
      // If we could, we would just add the object to the
      // TRefArray's ProcessID.  For now, just check the
      // ProcessID it would be added to, i.e the current one,
      // is not full.

      if (!(TProcessID::GetObjectCount() >= 16777215)) {
         valid = (fPID == TProcessID::GetSessionProcessID());
         std::cout << (valid ? "4t" : "4f");
         if (valid) {
            uid = TProcessID::AssignID(obj);
         }
      } else {
         // The AssignID will create a new TProcessID.
         if (GetAbsLast() < 0) {
            // If we are empty, we can handle it.
            uid = TProcessID::AssignID(obj);
            fPID = TProcessID::GetProcessWithUID(obj);
            Warning(TString::Format("TRefArray::%s",methodname),"The ProcessID for the %p has been switched to %s/%s:%d. There are too many referenced objects.",
                    this,fPID->GetName(),fPID->GetTitle(),fPID->GetUniqueID());
            std::cout << "5t\n";
            return kTRUE;
        } else {
            Error(TString::Format("TRefArray::%s",methodname),"The object at %p can not be registered in the process the TRefArray points to (pid = %s/%s) because the ProcessID has too many objects and the TRefArray already contains other objects.",obj,fPID->GetName(),fPID->GetTitle());
            std::cout << "6f\n";
            return kFALSE;
         }
      }
   }
   std::cout << (valid ? "7t\n" : "7f\n");
   if (!valid) {
      ::Error(TString::Format("TRefArray::%s",methodname),
              "The object at %p is not registered in the process the TRefArray points to (pid = %s/%s)",obj,fPID->GetName(),fPID->GetTitle());
   }
   return valid;
}

So valid = (fPID == TProcessID::GetProcessWithUID(obj)); suddenly is false. Unfortunately, it would be great if someone with more knowledge about ROOT’s internals could follow-up on this.

About the memory leak:
Here I also see a lot of strange code where I cannot reason about the correctness… clang-tidy points me here:

/home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:419:3: error: Potential leak of memory pointed to by 'cache' [clang-analyzer-cplusplus.NewDeleteLeaks,-warnings-as-errors]
  return norm ;
  ^
/home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:405:7: note: Assuming 'cache' is null
  if (cache) {
      ^
/home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:405:3: note: Taking false branch
  if (cache) {
  ^
/home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:415:11: note: Memory is allocated
  cache = new CacheElem(*norm) ;
          ^
/home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:419:3: note: Potential leak of memory pointed to by 'cache'
  return norm ;
  ^
/home/wbehrenh/src/root/root-head/roofit/roofitcore/src/RooAbsPdf.cxx:525:3: error: Potential leak of memory pointed to by 'cache' [clang-analyzer-cplusplus.NewDeleteLeaks,-warnings-as-errors]
  delete depList ;
  ^

It is really hard to reason about who the owner of all the memory allocated here is supposed to be. Finally I came to RooDataSet which is malloc’ing memory in its own new operator. That’s where the leak sanitizer points to. I don’t quite understand the comment:

/// Overloaded new operator guarantees that all RooDataSets allocated with new
/// have a unique address, a property that is exploited in several places
/// in roofit to quickly index contents on normalization set pointers.

Isn’t that the case also with a “normal” new? Or is is meant to always give a new address even when older objects have been deleted? Anyway, it is weekend and it is not my problem :wink:

Hi All,
Was this ever understood? I’m running into the same problem… Same as posted before, large amount of datasets lead to this AddAtAndExpand error. I’m using ROOT 6.14. Any idea?

Thanks in advance.

I created a bug report such that this issue can be looked at in detail:
https://sft.its.cern.ch/jira/browse/ROOT-10277