0: caught exception triggered by signal '1' while merging ob

ganis · December 9, 2012, 10:51am

Hi,

I think the problem is that there is an inconsistency in the TTreeCache settings. Proof does and needs to do the settings by itself. Doing outside it may have weird side effects, because SetCacheSize destroys the previous cache.
At which level are you doing these settings

   chain.SetCacheSize(TREE_CACHE_SIZE);
   chain.AddBranchToCache("*",kTRUE);

?
Can you try to remove them?

G. Ganis

EugenyBoger · December 9, 2012, 7:44pm

Hello Gerri,

As I already said, I tried to remove these two lines and I also tried to call SetCacheRead(0) - neither seems to help.

Anyway, regardless of what part of my program triggers it, don’t you think this is a bug of ROOT? The segfault (Invalid read) happens at

  TFileCacheRead *cache = dynamic_cast<TFileCacheRead *>(fCacheReadMap->GetValue(key));

Where fCacheReadMap->GetValue(key) value appears to be a dangling pointer free’d before.

ganis · December 9, 2012, 8:33pm

Sorry for overlooking the already tried test.

Bugs are always possible, but since this is something that works in other cases (and the user that open this thread does not have the problem any longer), it may be due to some mis-action. That pointer is invalid in your case; but it general it is valid.

The only way to proceed is to look at you code or to a minimal version of it reproducing the problem.

Cheers, Gerri

EugenyBoger · December 10, 2012, 1:29am

Hi,

The old version of our code, which also triggers the error is available here.

I’ll also try to find the exact revision where this error occurred for the first time.

UPDATE: The error occurred in the revision 44846:

Index: tree/tree/src/TTree.cxx
===================================================================
--- tree/tree/src/TTree.cxx     (revision 44845)
+++ tree/tree/src/TTree.cxx     (revision 44846)
@@ -7420,6 +7420,19 @@
    }
    if (fDirectory) {
       fDirectory->Remove(this);
+
+      // Delete or move the file cache if it points to this Tree
+      TFile *file = fDirectory->GetFile();
+      if (file) {
+         TFileCacheRead *pf = file->GetCacheRead(this);
+         file->SetCacheRead(0,this);
+         TFile *newfile = dir ? dir->GetFile() : 0;
+         if (newfile) {
+            newfile->SetCacheRead(pf,this);
+         } else {
+            delete pf;
+         }
+      }
    }
    fDirectory = dir;
    if (fDirectory) {

EugenyBoger · December 20, 2012, 4:45am

Hello,

I believe it’s a bug in ROOT TTreeCache handling code.
Here is the minimal code which reproduces the problem.

It has nothing to do with PROOF except the fact that PROOF implicitly enables the cache for input files.

The exact portion of code which triggers the bug was:

     T_select = fChain->CloneTree(0);
     T_select->SetDirectory(f_select);

I.e. clone input tree, then output it to the TProofOutputFile (f_select). The TTree::SetDirectory() on cloned tree somehow makes both input and output files share the same TTreeCache. Eventually destructors (or Close() methods) of both files will try to remove the same pointer which will lead to crash.

P.S.
I think it’s worth mentioning here because the “pre-selection” approach with CloneTree() might be a common scenario in PROOF analyses.

ganis · December 20, 2012, 10:43am

Hi,

Yes, that is it. During the quick look I gave at first at your code I missed the CloneTree in Notify.
The Clone call in CloneTree brings the existing cache to the cloned tree.

You don’t need the cache for the output file so you have to reset the cache in the output tree without destroying the one used by the input tree. To do that you need to detach the tree form the input file and call SetCacheSize(0), i.e.

  T_select = fChain->CloneTree(0);
  T_select->SetDirectory(0);
  T_select->SetCacheSize(0); // Sets no cache for the output tree without touching the existing one
  T_select->SetDirectory(f_select);

I have asked the expert (P. Canal) to comment.

Cheers, Gerri

pcanal · December 20, 2012, 1:52pm

Hi,

The issues was not that the cache setting were copied by CloneTree but that doing SetDirectory where the tree being ‘moved’ was in a file with a cached TTree was leading to some confusion. This has been fixed yesterday in the trunk and the v5-34 patch branch as of revision 48153.

Cheers,
Philippe.