Change in behaviour of TObjArray::Delete()?


after installing the newest ROOT version I observed a significant memory leak in our analysis. After trying a few versions, I found the change happing somewhere between v5-34-02 (ok) and v5-34-06 (not ok). As far as I can see from valgrind, this is connected with objects stored in a TObjArray, which are supposed to be deleted by calling TObjArray::Delete() and this happens in a part of our code, which hasn’t been changed since years. It seems that the objects are deleted at least until v5-34-02, but are are no longer in v5-34-06. I didn’t find anything in the release note, which I could connect to that. It also does not seem to be happen for all TObjArrays in the code. Was there any change that prevents deleting the objects in a TObjArray under some circumstances when calling Delete()? If there is no obvious change, I have to dig further, but first I want to make sure that I didn’t miss anything changes.

Best regards,

Can you, please, show the minimal code reproducing the behaviour you described (leaking now, not leaking with older version)? It should something simple I guess?

Unfortunately, that is not as simple as it sounds as this concerns only some TObjArrays in our code. At the moment it is not clear what makes the difference. Therefore I hoped to get some hint on possible obvious changes that could lead the way. If there are none, I will keep digging.

I have identified the change that causes/triggers the problem.

First of all, the change was not between 34-02 and 34-06, but 34-02 and 34-09. There was a typo in my setup script selecting 09 instead of 06. I finally pinned it down to a single commit between 34-08 and 34-09. That commit changed in core/cont/src/TObjArray.cxx the function TObjArray::Delete() from

for (Int_t i = 0; i < fSize; i++) { if (fCont[i] && fCont[i]->IsOnHeap()) { TCollection::GarbageCollect(fCont[i]); fCont[i] = 0; } }

[code] bool needRegister = fSize && TROOT::Initialized() &&
if (needRegister) gROOT->GetListOfCleanups()->Add(this);

for (Int_t i = 0; i < fSize; i++) {
if (fCont[i] && fCont[i]->IsOnHeap()) {
fCont[i] = 0;
if (needRegister) ROOT::GetROOT()->GetListOfCleanups()->Remove(this);[/code]
If I undo the change in later versions the memory leak is gone.

I still don’t know what exactly triggers the leak. I figured out, that I only see the problem in cases where subsets of the objects in a TObjArray are also stored in TRefArrays in other objects. However, I didn’t manage to reproduce this in a simple example.

As I see it, this change should not have such an effect as the array is first added to the list of cleanups and later removed. However, uncommenting only the line where the array is added keeps the memory consumption stable. The additional memory consumption is about 700 Byte per TObjArray::Delete() call, so it’s not just the pointer to the array, but the stored objects.

Any hint what I am missing here?

Best regards,

Now I also managed to identify the actual cause of the memory leak.

The coincident use of a TRefArray in parallel was misleading and indeed only a coincidence. The reason was an reimplementation of the IsEqual() member function for the stored objects. That was also the reason why no “simple” example reproduced the error.

The problem is that the current coding of TObjArray::Delete() is bound to produce a memory leak if the IsEqual() function of the stored objects allows objects to be equal although the memory address is different - which is actually the only reason to change the default which tests the memory address. Adding a TObjArray to the ListOfCleanups has the effect that TCollection::GarbageCollect(obj) (which uses RecursiveRemove) removes all objects from the array which are considered equal, i.e. all “equal” objects with a different memory addresses are removed but never deleted with the result of a memory leak.

Thus, in my opinion, allowing a reimplementation of IsEqual() to cover cases for objects being equal without sharing the same memory address and relying on the fact that they have in TObjArray::Delete() is a design flaw.