I’ve got a TPySelector which runs great, but recently it’s been segfaulting in Terminate() while writing some TTrees to file. I’ve noticed that this only happens if I process a certain number of events, so I assume it has something to do with the size of the trees being written (which may be quite large). Any help would be appreciated!
Here’s the offending code, and a stack trace is attached.
for h in self.cutflows.values():
self.GetOutputList().Add(h)
for t in self.outTrees.values():
self.GetOutputList().Add(t.tree)
print self.nevents, "processed."
print float(self.nevents)/(time() - self.begin), "events processed per", \
"second, excluding startup and shutdown overhead."
def Terminate(self):
print "ttSelector.py: terminating."
of = TFile("out.root", "RECREATE")
for item in self.GetOutputList():
print "ttSelector.py: writing", item.GetName(), "to file."
stdout.flush()
item.Write()[/code]
this error message: “std::bad_alloc (C++ exception)”, would indicate that the job ran out of memory. Is it possible not to keep all information in memory, but to write portions out to disk along the course of the job?
Is there a prescription for doing this on the fly?
I’d still like to be able to run on a PROOF cluster, but I’m not sure at which step in the process all the trees from the slaves are added together. I assume it’s at Terminate(), but I suppose it could be in some method that can’t be overloaded in python.
OK, well I can wait on the PROOF info for now. There should be a merging of the TSelectorLists from all the slaves on the master node at some point, but I can’t find the method in which that happens. I have the feeling that just understanding at which stage this happens would be helpful.
But I’m interested to know if there’s an automatic way to store chunks of the tree to file when they’re already Fill()'d? That would be extremely handy. Or is there a ‘standard’ manual way to do the same thing? That’d be helpful as well.
I don’t think that there’s a memory growth problem in the python bindings. I’ve tested on a little toy-version in c++ and I get the same problem–the trees are just too big.
you could reduce the maximum amount of memory resident data of the TTree by putting setting lower value with SetMaxVirtualSize() (get the current value with GetMaxVirtualSize() ). Actually, maybe have a look first to see what GetMaxVirtualSize() gives and multiply it with the total number of trees that you have to see whether that is indeed a number large enough to cause memory problems on the machine that you are running.