Memory leak in TTree::Fill() on OS X?

Geert-Jan · March 16, 2016, 2:01pm

Hi,

I’m noticing a memory leak on OS X that I don’t see on Linux running the same code. I have a list of trees in an std::vector (these correspond to systematic variations) and call TTree::Fill() on each of them.

In a test where I have no systematics and only the nominal tree, I see a leak that is not big but still enough. Since the leak is report as 31.50 kb, I suspect that it has to do with the way the baskets are treated.

Attached is the relevant snippet of the output of Apple’s Instruments, which I attached to a running process on several thousands of events. Has anyone observed this before? The usage as reported by top or ps also clearly increases over time. The output on disk is 48MB so I clearly need the baskets to be flushed every once in a while. The function RegionTreeList::fill() simply is a loop over all the attached trees that calls tree->Fill() for each of them.

On Linux, valgrind reports no such issues. The branches of the tree I am writing out are either vectors of floats, or plain floats. No special types are written to the tree.

Cheers,
Geert-Jan

pcanal · March 16, 2016, 2:17pm

Hi Geert-Jan,

Usually when we see this kind of memory growth, the issue that the data being filled in is growing (i.e. I have often seen the call to std::vector::clear missing).

From just this trace it is hard to know whether this is the case or if something else is going on. Would you be able to provide a running example demonstrating the problem?

Thanks,
Philippe.

Geert-Jan · March 16, 2016, 2:27pm

Hi Philippe,

I will try to make a minimal example (this is in code analysing ATLAS xAODs, so it’s hard to provide the entire code). I don’t think an improper clearing of vectors is it - we would then see the same on Linux, would we not?

Cheers,
Geert-Jan

pcanal · March 16, 2016, 2:33pm

Hi Geert-Jan,

I don’t think an improper clearing of vectors is it - we would then see the same on Linux, would we not?

Yes, if it is ‘unconditional’ … and a similar argument can be made that any of the ‘obvious’ problem in TTree::Fill itself would also be seen on linux.

Either way a reproducer minimal or not will be necessary to debug this .

Thanks,
Philippe.

Geert-Jan · March 18, 2016, 2:08pm

Hi Philippe,

Apologies for the slow reply in following up on this: it turned out to be a little hard to trace this leak’s origin. I’ve isolated the culprit in one of the ATLAS EventLoop functions, but am struggling to understand why this could cause the leak. The code uses so-called workers to which one can add output that will be bookkept into a file, e.g. a TH1 cutflow histogram. They can be added in the following way:

[code] void Worker ::
addOutput (TObject *output_swallow)
{
std::auto_ptr output (output_swallow);

        RCU_CHANGE_INVARIANT (this);
        RCU_REQUIRE_SOFT (output_swallow != 0);

        RCU::SetDirectory (output_swallow, 0);
        m_output->Add (output.release());

        TH1 *hist = dynamic_cast<TH1*> (output_swallow);
        if (hist)
            m_outputHistMap[hist->GetName()] = hist;
    }

[/code]

Ultimately this worker class writes out its data like this:

[code] TList output;
DirectWorker worker (*sample, &output, job, location, &meta);

  std::cout << "Running sample: " << (*sample)->name() << std::endl;
  worker.run ();

  saveOutput (location, (*sample)->name(), output);[/code]

And saveOutput loops over the output container. It even calls output.Clear(‘nodelete’), so it won’t destroy any objects.

In other words: what I believe that happens is that this internally will just know about a pointer to my TTree and that’s it. It doesn’t delete the TTree at the end, nor ever explictly clear this output TList with the delete flag (there is also no call to setOwner).

Do you have any idea why this construction, where effectively two pointers to the TTree exist, could cause the baskets to leak? Without the addOutput call, I don’t see any leaks - so it’s really this function that must be the culprit.

Geert-Jan

P.S. Of course I do realise that I could change the internal TList to an std::vector<std::shared_ptr> or something like that, but that is a pain and would require a major rewrite of a big chunk of code for a simple test!

pcanal · March 18, 2016, 2:20pm

Hi,

Do you also call Clear on m_outputHistMap?

Philippe.

Geert-Jan · March 18, 2016, 2:23pm

Hi Philippe,

No, that is not done as far as I can tell. (However, a TTree shouldn’t even have ended up in it, should it?)

Cheers,
Geert-Jan

pcanal · March 18, 2016, 2:29pm

Hi Geert-Jan,

(However, a TTree shouldn’t even have ended up in it, should it?)

Correct but we are actually interested/wondering about the reverse. i.e. Is this map stored in a TTree. The name of the map (m_outputHistMap) seems to indicates that it might be stored somewhere … and it could be in the TTree.

Cheers,
Philippe.

Geert-Jan · March 18, 2016, 2:39pm

Hi Philippe,

They don’t end up in my tree. The map holds some basic information (for example running time per algorithm etc.) that goes into a separate file meant for histograms. The tree only has the branches that I defined for it.

Cheers,
Geert-Jan

pcanal · March 18, 2016, 3:01pm

Hi,

Fair enough. What are those branches and their type? If I remember correctly, your process does not leak on linux but leaks on macos. Can you send me the result of TTree::Print on the final result on both cases?

Thanks,
Philippe.

Geert-Jan · March 18, 2016, 3:25pm

Hi Philippe,

Actually, it turns out the leak occurs in both cases. My demo to test had exactly one branch: an std::vector called ‘pt’:

[code]root [2] foo_NOMINAL->Print()

*Tree :foo_NOMINAL: NOMINAL *
*Entries : 5000 : Total = 60330692 bytes File Size = 8654561 *

   :          : Tree compression factor =   6.97                       *

*Br 0 :pt : vector *
*Entries : 5000 : Total Size= 60330286 bytes File Size = 8642500 *
*Baskets : 2500 : Basket Size= 32000 bytes Compression= 6.97 *
…[/code]

I just wrote the numbers 0 to 1500 in this repeatedly in a loop of 5000 steps (with 25x loops per step, otherwise the program finished too fast to attach a leak checker).

Cheers,
Geert-Jan

ferhue · March 18, 2016, 4:17pm

Do you delete the vector after each time you Fill() ?. Note that if you have a pointer to a vector (pt) linked to the branch, you should do that.

pcanal · March 18, 2016, 4:37pm

Hi Ferhue,

Do you delete the vector after each time you Fill() ?. Note that if you have a pointer to a vector (pt) linked to the branch, you should do that.

In the general case you should not delete the vector between each Fill but rather re-use it. This save (a lot of) time wasted in memory (re)allocation.

Cheers,
Philippe.

pcanal · March 18, 2016, 4:38pm

Hi Geert-Jan,

What is the result (or a summary of)tree->Scan("pt@.size()");?

Philippe.

ferhue · March 18, 2016, 8:19pm

thanks for clarification

Geert-Jan · March 19, 2016, 8:56am

[quote=“pcanal”]Hi Geert-Jan,

What is the result (or a summary of)tree->Scan("pt@.size()");?

Philippe.[/quote]

Hi Philippe,

Here it is:

[code]root [1] foo_NOMINAL->Scan(“pt@.size()”);

Row * pt@.size( *

```
   0 *      1500 *
```
```
   1 *      1500 *
```
```
   2 *      1500 *
```
```
   3 *      1500 *
```
```
   4 *      1500 *
```
```
   5 *      1500 *
```
```
   6 *      1500 *
```
```
   7 *      1500 *
```
```
   8 *      1500 *
```
```
   9 *      1500 *
```
```
  10 *      1500 *
```
```
  11 *      1500 *
```
```
  12 *      1500 *
```
```
  13 *      1500 *
```
```
  14 *      1500 *
```
```
  15 *      1500 *
```
```
  16 *      1500 *[/code]
```

It’s 1500 all the way up to row 4999.

Cheers,
Geert-Jan