Processing tree with large amount of events eventually stalls

I’m using this script to process a tree that contains a bit over 56 million events (merged from a TChain built from a number of files). It works just fine, but it will eventually grind to a halt with a large number of events, as it has now done with about 6.6 million events processed. (That is, it’s showing the progress counter, but the counter doesn’t increment).

I can see the process in htop but it’s listed as using 0.0 CPU time and 0.3% of RAM, and has been running for 115 hours.

How can I modify this script so that it will write to disk as it goes instead of holding things in memory and then dumping periodically?

As a bonus, can the file be examined as it is being created? Right now, trying to open it gives:

Attaching file Run0654-0667.root.banana-tree-test-100-entries.root as _file0…
Error in TKey::ReadObjWithBuffer: Unknown class
Info in TFile::GetStreamerInfoList: cannot find the StreamerInfo record in file Run0654-0667.root.banana-tree-test-100-entries.root

Hi,

Which version of ROOT? Which OS/compiler? How do you run your script? Could you try to use ACLiC and attach gdb to the running session when stalled?

Cheers Bertrand.

It’s ROOT 5.34 under CentOS 6 and I do have gdb installed.

Is there a way to attach to the session without having to restart it? It took a while to get to where it is now. If not, how do I go about hooking gdb to it? I haven’t tried to do it before.

I don’t use gdb very often (I’m working mostly on Windows), but I’m fairly sure you can. Just look at the gdb documentation, or google for it (https://www.google.ch/search?q=gdb+attach+process).

Cheers, Bertrand.

Hi,

The loop ends with

which seems never used. Similarly, the loop seems to open new files with TFile *thisFile = TFile::Open(segmentName[r][iExtra].c_str(),"READ");

But it nevers deletes any of them. Most likely, the problem is that your are running out of resources (file descriptor and memory).

bool declared; if (!declared) { gROOT->ProcessLine(".L /data/macros/treesort/gtree.C"); gtree ttree; /* load class into memory (not doing this causes crash) */ declared = true; /* value will persist between runs in this session */ }
is doubly odd. ‘declared’ is never initialized and thus the if statement is ‘random’ per se. ‘declared’ is not set as a function static variable and thus the comment " value will persist between runs in this session" is technically wrong (albeit the same memory stack ‘might’ be re-used and not re-initialized and thus it ‘might’ appear to work sometimes … but will also fail some other time.

This should not be the case, unless the constructor of ‘gtree’ does something ‘important’

You can simplify this code:

          TKey *tebKey = gDirectory->FindKey("teb");
          if (tebKey ==  0) {
            cout << "Can't find tree 'teb'!" << endl;
            return;
          }
          TTree *thistree = (TTree*)gROOT->FindObject("teb");

with just

TTree *thistree = nullptr; thisFile->GetObject("teb",thistree); if (thistree == nullptr) { cout << "Can't find tree 'teb'!" << endl; return; }Note: this pattern is used multiple time in your script.

which part is ‘held in memory and dumped periodically’ (beside the ‘entry’ of the output TTree)?

Yes. Just make sure that either AutoSave is set to the frequency you want (SetAutoSave(frequencyInNumberOfEntries) for example) or call it explicitly. In addition you need to call SaveSelf on the TFile ptr.

        ntuple->SetAutoSave(numberOfEntries);
        .....
        if (readyToSnapshot) outputfile->SaveSelf();
[/code]or[code]
        if (readyToSnapshot) {
            ntuple->AutoSave();
            outputfile->SaveSelf();
        };

Note: don’t do that ‘too often’ as this is a slow operation and doing too often with reduce the efficient of both writing and reading the output file.

On the reading side, you can monitor this file then with something like:

while(1) { f->ReadKeys(); TTree *ntuple = nullptr; f->Get("ntuple",ntuple); if (first == 0) ntuple->Draw("px>>hpx","","",10000000,first); else ntuple->Draw("px>>+hpx","","",10000000,first); first = (Int_t)ntuple->GetEntries(); delete ntuple; // Since it will change the next time we loop around, we don't want to keep it. c1.Update(); gSystem->Sleep(1000); //sleep 1 second }
Cheers,
Philippe.

I rewrote the script from scratch and split the output (every X entries) into multiple files which I then combined into a TChain and then merged with TChain::Merge. The problem seems to have been that the process eventually hangs, perhaps due to available memory issues.

Thank you all for the suggestions, which were useful for the revised script!