TKey deletion is very slow

I have a custom histogram merging script, because hadd doesn’t quite cut it, for a variety of reasons*. I found that my script was extremely slow, and I believe I have located the reason:

In [1]: import ROOT as R

In [2]: lots_of_keys = [R.TKey() for i in xrange(1000)]
In [3]: time del lots_of_keys
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s

In [5]: lots_of_keys = [R.TKey() for i in xrange(5000)]
In [6]: time del lots_of_keys
CPU times: user 0.21 s, sys: 0.00 s, total: 0.21 s
Wall time: 0.21 s

In [8]: lots_of_keys = [R.TKey() for i in xrange(10000)]
In [9]: time del lots_of_keys
CPU times: user 1.21 s, sys: 0.00 s, total: 1.21 s
Wall time: 1.21 s

In [11]: lots_of_keys = [R.TKey() for i in xrange(20000)]
In [12]: time del lots_of_keys
CPU times: user 5.82 s, sys: 0.00 s, total: 5.82 s
Wall time: 5.82 s

In [14]: lots_of_keys = [R.TKey() for i in xrange(40000)]
In [15]: time del lots_of_keys
CPU times: user 37.73 s, sys: 0.01 s, total: 37.74 s
Wall time: 37.75 s

So it seems that when keys get garbage collected, we spend lots of time deleting them!

Looking at:
root.cern.ch/root/html/src/TKey.cxx.html#gCb_UB

I see that TKey chose to redefine it’s Delete method to do the key deletion from the current file. As such, the TKey::Delete method calls “fMotherDir->GetListOfKeys()->Remove(this);”, which I believe could be the origin of the quadratic behaviour for deleting lots of TKeys. If this hypothesis is true, then I need a way to correctly delete these keys without the undesired behaviour. If it is false, then does anyone else have any ideas what I’m doing wrong?

I have tried experimenting with ways of deleting the keys manually but I haven’t found anything that works yet**. I was tempted to SetOwnership(False) on the keys and then delete them on the C++ side, but I would rather not have extra C++ code floating around. I would prefer a self-contained script. Is there any way to delete these objects properly from pure python?

  • In particular, it doesn’t support merging lots of different types, like TParameter and THnSparse. Also, it assumes that all of the histograms you want to merge exist on the first input file, which is not true for my set of files.

** I tried using TClass::GetDelete, then trying to obtain a pointer to the buffer (which I think one has to do by parsing the repr?!) then calling it with ctypes, but I couldn’t get this to do anything other than segfault. I also wondered about trying to use the TObject::Delete method on my TKey object but don’t know whether this makes sense or is possible from PyROOT.

Hi,

how about storing the keys in a TList and calling TList::Delete() once done with them?

Cheers,
Wim

Great, this works a treat: (not useful in itself, but just for testing)

def make(n): lots_of_keys = [R.TKey() for i in xrange(n)] lst = R.TList() lst.SetOwner() [(lst.Add(k), R.SetOwnership(k, False)) for k in lots_of_keys] return lst, lots_of_keys
There seems to be some interesting behaviour where if you delete the second return value first, you get the slowdown, but not if you delete the TList first. So this method still needs lots of babysitting.

Next question: How to fix ROOT so that this isn’t broken?

Hi Wim, Thanks very much for your fast response before, it is much appreciated.

Oops, another interesting observation: My keys come from TFiles, which actually delete the TKeys themselves. Therefore, only SetOwnership() is necessary, or presumably by closing the TFile before the keys have a chance to reach the garbage collector.

This is still a problem for me.

Reading TKeys from a file, the TList method doesn’t work. I still spend lots of time in the garbage collector.

If I try to save the TKeys from many files (in a dictionary or list), then the time taken to iterate (!) over TDirectory::GetListOfKeys() appears to depend on the number of keys seen so far. #-o (To be clear: Iterating over a constant number of keys seems to depend on the total number of keys that exist in memory.)

Thanks in advance for any help.

Here is some more minimal code to reproduce my situation:

It creates a temporary directory containing a reasonable number of ROOT files, each with 1000 histograms. Then it demonstrates what happens if you try to garbage collect the list of keys, and the effect of closing the file before doing so.

I would like to keep the TFile objects alive whilst collection of the TKeys happens.

[code]#! /usr/bin/env python

from contextlib import contextmanager
from gc import collect, get_objects
from tempfile import mkdtemp
from os import unlink
from os.path import join as pjoin
from shutil import rmtree
from time import time

def alive_objects():
return len(get_objects())

@contextmanager
def timer(what):
start = time()
alive_before = alive_objects()
try: yield
finally:
alive_after = alive_objects()
print “Took %.2f to %s” % (time()-start, what),
print “(alive objects: pre: %i post: %i)” % (alive_before, alive_after)

with timer(“Import ROOT”):
import ROOT as R; R.kTRUE

def make_lots_of_histograms(nkeys):
for i in xrange(nkeys):
h = R.TH1F(“h%05i” % i, “”, 1, 0, 1)
h.Write()

@contextmanager
def lots_of_temporary_root_files(nfiles, nkeys_per_file):
tmpdir = mkdtemp(“tkey_slowness_mcr”)
try:
filenames = []
with timer(“build %i root files” % nfiles):
for i in xrange(nfiles):
filename = pjoin(tmpdir, “%04i.root” % i)
f = R.TFile(filename, “recreate”)
make_lots_of_histograms(nkeys_per_file)
filenames.append(filename)
f.Close()
yield filenames
finally:
rmtree(tmpdir)

def read_all_keys(files):
all_keys = []
for f in files:
all_keys.extend(f.GetListOfKeys())

with timer("GC all_keys"):
    del all_keys
    collect()

def read_all_keys_close_before_gc(files):
all_keys = []
for f in files:
all_keys.extend(f.GetListOfKeys())

for f in files:
    f.Close()   

with timer("GC all_keys (closed files)"):
    del all_keys
    collect()

def tkey_slowness_test(files):
root_files = [R.TFile(f) for f in files]

with timer("--read all keys"):
    read_all_keys(root_files)

with timer("--read all keys (gc after closed file)"):
    read_all_keys_close_before_gc(root_files)

def main():
with timer(“Complete”):
with lots_of_temporary_root_files(nfiles=40, nkeys_per_file=1000) as files:
with timer(“Perform tkey_slowness_test”):
tkey_slowness_test(files)

if name == “main”:
main()[/code]

And the output for 20 files:

Took 2.12 to Import ROOT (alive objects: pre: 5472 post: 10390) Took 2.88 to build 20 root files (alive objects: pre: 10417 post: 11116) Took 2.64 to GC all_keys (alive objects: pre: 51646 post: 11647) Took 2.86 to --read all keys (alive objects: pre: 11159 post: 11641) Took 0.02 to GC all_keys (closed files) (alive objects: pre: 31646 post: 11647) Took 0.19 to --read all keys (gc after closed file) (alive objects: pre: 11639 post: 11641) Took 3.10 to Perform tkey_slowness_test (alive objects: pre: 11112 post: 11594) Took 5.98 to Complete (alive objects: pre: 10400 post: 11582)
And the output for 40 files:

Took 2.15 to Import ROOT (alive objects: pre: 5472 post: 10390) Took 5.33 to build 40 root files (alive objects: pre: 10417 post: 11116) Took 12.27 to GC all_keys (alive objects: pre: 91686 post: 11687) Took 12.61 to --read all keys (alive objects: pre: 11199 post: 11681) Took 0.03 to GC all_keys (closed files) (alive objects: pre: 51686 post: 11687) Took 0.46 to --read all keys (gc after closed file) (alive objects: pre: 11679 post: 11681) Took 13.15 to Perform tkey_slowness_test (alive objects: pre: 11112 post: 11594) Took 18.50 to Complete (alive objects: pre: 10400 post: 11582)

(Beside the slowness issues), the code:[code]for f in files:
all_keys.extend(f.GetListOfKeys())

for f in files:
    f.Close()   

with timer("GC all_keys (closed files)"):
    del all_keys
    collect()

[/code]is in my opinion not supported as the key have no meaning without their TFile and are owned by the TFile and thus the TKey object should not be accessed after the TFile have been closed. (I.e. What were you trying to achieve in the first place?)

Cheers,
Philippe.

It wasn’t my intention to Close the files and then use the TKeys. I was hoping to get the most recent cycle TKey for lots of different files, and hang on to them until I wanted to use them later. The main observation here is that having a large number of TKeys around seems to scale very badly when it comes to the garbage collection, which is quite odd and unexpected behaviour.

The demonstration above with Closing() the file followed by deleting the keys was just to show that the speed issue goes away if the keys are deleted (by the close) before they are garbage collected.

In the end, I believe I have solved my particular problem by only dealing with one file at a time with a different approach. However, it still seems that ROOT’s behaviour is broken here if one loads many TKeys. You can load a million of them in a few seconds from a TFile, but then the garbage collection can take many hours, for no reason. This is very bad and unexpected, and also hard to debug. (how often do people doing timing measurements end up suspecting the garbage collection?!)

Cheers for the help,

  • Peter

Hi Peter,

ROOT files are not designed to support millions of TKeys, in the same way that you will have a performance penalty if you have one million files in one directory. This is done on purpose and if you hit the problem, it is a sign of a wrong data model. Having so many TKeys mean probably that they correspond to objects of the same type. In this case, you better use a TTree to store your data. Not only the access time will be far more efficient, but you will also decrease substantially the file size.

Rene

Hi Rene,

I didn’t mean that one would (or should be able to) load millions of TKeys from a single file. But loading a million across lots of files for the purposes of aggregation doesn’t seem like it should be impossible? I’m not sure this is a ROOT problem since if I did it in C++, the TKey::Delete* method would never called, so far as I am aware.

  • (Speculation: I could be wrong, and/or I could be wrong that this is the problem here, it just seems like a likely candidate because it explains the rate of slowdown).

The problem as I understand it so far is that something is happening when these TKey objects are being garbage collected which is causing a quadratic slowdown. I don’t see why this should be the behaviour. SetOwnership(k, False) for all keys doesn’t fix it either, for some reason.

If you look at the numbers in the first post, 1000 TKeys going out of (python!) scope takes 0.01s. 20x times that takes 5 seconds, and 40x that takes 40 seconds. On the other hand, reading a million TKeys takes only a few seconds from a few hundred TFiles. The problem here is that I can’t control when the TKeys go out of scope on the python side to ensure that this happens before the TFile. My point above was that if I Close() the file before allowing the TKey deletion to happen, the slowdown goes away. Unfortunately I can’t easily ensure that happens everywhere, and I’m sure other people haven’t thought about this level of detail in their code which reads the contents from lots of files.

For me, this problem doesn’t matter much immediately since I solved it with a completely different approach. However I believe I might not be the only person bitten by this behaviour. My new approach ran in 20 minutes compared to 17 hours with the TKey-slowdown code, and I believe a fair portion of this can be attributed to this quadratic behaviour.

Regards,

  • Peter

Just on the off-chance anyone ever finds themselves reading this wanting to know what I’m doing differently:

For my problem I can consider one file at a time, so I read all of the TKeys from a single file then Close it. The largest number of keys around at a given moment is ~10k, which is in the “acceptably-slow” regime from above. (Although still slow! It could be ~0.1s (if the slowdown was linear in the number of keys, or the deletion mechanism in python wasn’t doing something funny) and instead it is 1.2s. This is a penalty paid for every TFile I consider).

Peter,

Thanks for this clarification. I assume that it is a pyroot specific problem and assume that Wim will follow.

Rene

Hi,

It is plausible that the slowdown will be reduced if you use:gROOT->GetListOfFiles()->Remove(file);
However, you also need to make absolutely sure that you have no objects/pointers shared amongst any of the files and you need to explicitly delete the file objects.

Cheers,
Philippe.

In my mind there is still a fundamental question:

Is it true that TKey::Delete is being called? Looking at the destructor, I see no reason that this should happen on TKey object deletion. Yet from the quadratic slowdown, it seems like this (or something like this) is what is happening during the python object proxy deletion.

Hi,

PyROOT calls TClass::Destructor() on the object, not the object destructor, which makes the difference.

Btw., this is the third topic on this (and memory regulator tracking of objects), so I really do want to find something better, but haven’t gotten around to it, though.

Cheers,
Wim

[quote]PyROOT calls TClass::Destructor() on the object, not the object destructor, which makes the difference.[/quote]It should not. In case the class has a dictionary, it is only a think wrapper around the object destructor.

Cheers,
Philippe.

Just calling the dtor isn’t enough, as the memory needs releasing as well. From reading TClass::Destructor, it calls “fDelete” but I can’t quite follow where it comes from. Monkeying a bit, I find that simple code such as:from ROOT import TKey k = TKey() del kdoes indeed not call TKey::Delete(). Okay …

Then I’ll go back to blame the MemoryRegulator … yes, monkeying that shows that “del lots_of_keys” even for the large amount takes no additional time.

That I can most probably fix (or at least make it linear, rather than quadratic).

Cheers,
Wim

[quote]From reading TClass::Destructor, it calls “fDelete” but I can’t quite follow where it comes from.[/quote]It goes into the dictionary and call the operator delete and the destructor (i.e. delete obj;) ; (fDestructor calls only the destructor (i.e. obj->~classname() ).

Cheers,
Philippe.

Hi,

okay, my fix for MemoryRegulator is in trunk. The example at the beginning of the thread now runs linearly.

Cheers,
Wim