Should the result of TDirectory::GetDirectory() be deleted?

Hello,
I am writing an application which accesses ROOT files (browsing, reading, writing). To minimize the memory usage, I want to avoid memory leaks and any unnecessary objects on heap when the application is idle.

I am wondering whether the output of TDirectory:: GetDirectory should be deleted or it is just an observer pointer. I could not find any relevant information in the documentation of this method. By playing with it, I think I see a higher memory usage when it is not deleted, but also it does not increase more when I use the method multiple times on the same directory, so it does not seem to be leaking.

The same question actually applies to TDirectoryFile::mkdir.

Many thanks for your help!


Please read tips for efficient and successful posting and posting code

_ROOT Version: v6-26-02
Platform: Ubuntu
Compiler: GCC


After playing a little bit more, it seems that deleting a TDirectory actually removes it from a file as well, which would explain why the memory usage went down when deleting it and it was not leaking when not deleting it.

What is still unclear to me is why the memory usage of the application is still higher when I store bigger objects even when the file I access is closed. Does ROOT cache files in memory regardless of their open/closed status?

The result of mkdir and GetDirectory is owned by its TFile but can indeed be safely deleted to retrieve some memory (the TFile will be informed of the deletion and will re-read it from the physical file it is requested again).

Thanks! But is it then expected that the created directory in the file is actually empty once I reopen it? See the output below (“file.root” did not exist before):

root [0] auto f = new TFile("file.root", "UPDATE")
(TFile *) @0x7fff6fdee368
root [1] auto histo = new TH1F("asdf", "asdf", 1000, 0, 10)
(TH1F *) @0x7fff6fdee368
root [2] histo->FillRandom("gaus")
root [3] auto dir = f->mkdir("A")
(TDirectory *) @0x7fff6fdee368
root [4] dir->WriteObject(histo, "asdf", "Overwrite")
(int) 755
root [5] f->cd("A")
(bool) true
root [6] f->ls()
TFile**		file.root	
 TFile*		file.root	
  OBJ: TH1F	asdf	asdf : 0 at: 0x3fd24a0
  TDirectoryFile*		A	A
   KEY: TH1F	asdf;1	asdf
  KEY: TDirectoryFile	A;1	A
root [7] delete dir
root [8] f->Close()
root [9] delete f
root [10] auto f = new TFile("file.root", "UPDATE")
(TFile *) @0x7fff6fdee368
root [11] f->cd("A")
(bool) true
root [12] f->ls()
TFile**		file.root	
 TFile*		file.root	
  TDirectoryFile*		A	A
  KEY: TDirectoryFile	A;1	A

If I do not delete dir, the histo remains in the file. Using f->Flush() does not seem to help here.

Right. If you delete the directory explicitly (during writing), you need to explicit write it first.
I think:

dir->Write();

should work or you can use

f->Write();

(Actually either way the f->Write() is missing before calling Close().

Thanks, indeed I was missing a Write(). I also noticed that using Close() before deleting the TDirectory was also enough, but perhaps it is not for each possible case.

However, aggressively closing and deleting every possible pointer does not seem to decrease the memory usage a lot. I have prepared a reproducer that can be run with just ROOT. I am sorry it’s a bit long, but it reflects what I am actually trying to achieve in my application. It basically creates some histograms and stores them in a file, but if they already exist there, it merges them.

#include <iostream>
#include <iomanip>
#include <TH1F.h>
#include <TFile.h>
#include <TSystem.h>

void reportMemory(std::string prefix) {
  sleep(2);
  ProcInfo_t pi;
  gSystem->GetProcInfo(&pi);
  std::cout << std::setw(40) << prefix << ": resident " << std::setw(6) << pi.fMemResident / 1000 << "MB, virtual " << std::setw(6) << pi.fMemVirtual / 1000 << "MB" << std::endl;
}

void test2()
{
  reportMemory("test2 start");
  {
    auto file = new TFile("out.root", "UPDATE");
    if (file->IsZombie() || !file->IsOpen() || !file->IsWritable()) {
      std::cout << "failed to open the file" << std::endl;
      return;
    }

    TObjArray* arr = new TObjArray();
    arr->SetName("array");
    arr->SetOwner(false);
    reportMemory("created array");

    // this aims to create histograms worth 1000 * 25000 * 4B = 100MB of memory  
    for (auto i = 0; i < 1000; i++) {
      auto name = std::string("histo") + std::to_string(i);
      TH1F* histo = new TH1F(name.c_str(), name.c_str(), 25000, 0, 10000);
      histo->FillRandom("gaus");
      arr->Add(histo);
    }
    reportMemory("filled array");
    
    auto stored = file->Get<TObjArray>("array");
    if (stored != nullptr) {
      reportMemory("read another array from file");

      TObjArray* coll = new TObjArray();
      coll->SetOwner(false);
      coll->Add(stored);
      reportMemory("prepared a collection for merging");

      arr->Merge(coll);
      reportMemory("merged");

      delete coll;
      reportMemory("deleted wrapper collection");

      stored->SetOwner(true);
      delete stored;
      reportMemory("deleted the stored array");
    } else {
      reportMemory("did not find another array in file");
    }

    file->WriteObject(arr, arr->GetName(), "Overwrite");
    reportMemory("wrote array to file");

    arr->SetOwner(true);
    delete arr;
    reportMemory("deleted array");

    file->Write();
    reportMemory("did file->write");

    file->Close();
    reportMemory("did file->close");

    delete file;
    reportMemory("deleted file");
  }
  reportMemory("test2 end");
}

int main(int argc, char **argv)
{
   test2();
   return 0;
}

What I observed is that Writing, Closing and deleting does not decrease the memory usage, as you can see in the output below:

$> root -l
root [0] .x test2.C 
                             test2 start: resident    245MB, virtual    406MB
                           created array: resident    245MB, virtual    406MB
                            filled array: resident    347MB, virtual    505MB
      did not find another array in file: resident    347MB, virtual    505MB
                     wrote array to file: resident    363MB, virtual    520MB
                           deleted array: resident    363MB, virtual    520MB
                         did file->write: resident    363MB, virtual    520MB
                         did file->close: resident    363MB, virtual    520MB
                            deleted file: resident    363MB, virtual    520MB
                               test2 end: resident    363MB, virtual    520MB
root [1] .x test2.C 
                             test2 start: resident    363MB, virtual    520MB
                           created array: resident    363MB, virtual    520MB
                            filled array: resident    363MB, virtual    520MB
            read another array from file: resident    450MB, virtual    607MB
       prepared a collection for merging: resident    450MB, virtual    607MB
                                  merged: resident    450MB, virtual    607MB
              deleted wrapper collection: resident    450MB, virtual    607MB
                deleted the stored array: resident    450MB, virtual    607MB
                     wrote array to file: resident    451MB, virtual    607MB
                           deleted array: resident    451MB, virtual    607MB
                         did file->write: resident    451MB, virtual    607MB
                         did file->close: resident    451MB, virtual    607MB
                            deleted file: resident    451MB, virtual    607MB
                               test2 end: resident    451MB, virtual    607MB

The memory usage stays the same with later executions of the same script. Also, it does depend on the object size, so it cannot be just libraries. I performed the same test by compiling this macro and the results are roughly the same - the initial usage is much lower, but then reaches similar values.

Would you know why the memory usage does not fall despite nicely deleting and closing the objects and the file? Is there a way to avoid it?

It does not appear to be a leak but rather the system not release the process’ memory even-though it could. I am concluding this from the fact that running many times the functiontest2() in a row does not lead to any further memory increase:

root [0] .L /var/tmp/test2.C+
root [1] test2()
                             test2 start: resident    137MB, virtual    392MB
                           created array: resident    138MB, virtual    392MB
                            filled array: resident    241MB, virtual    493MB
      did not find another array in file: resident    242MB, virtual    493MB
                     wrote array to file: resident    248MB, virtual    497MB
                           deleted array: resident    248MB, virtual    497MB
                         did file->write: resident    248MB, virtual    497MB
                         did file->close: resident    248MB, virtual    497MB
                            deleted file: resident    248MB, virtual    497MB
                               test2 end: resident    248MB, virtual    497MB
root [2] test2()
                             test2 start: resident    248MB, virtual    497MB
                           created array: resident    248MB, virtual    497MB
                            filled array: resident    248MB, virtual    497MB
            read another array from file: resident    347MB, virtual    596MB
       prepared a collection for merging: resident    347MB, virtual    596MB
                                  merged: resident    347MB, virtual    596MB
              deleted wrapper collection: resident    347MB, virtual    596MB
                deleted the stored array: resident    347MB, virtual    596MB
                     wrote array to file: resident    347MB, virtual    596MB
                           deleted array: resident    347MB, virtual    596MB
                         did file->write: resident    347MB, virtual    596MB
                         did file->close: resident    347MB, virtual    596MB
                            deleted file: resident    347MB, virtual    596MB
                               test2 end: resident    347MB, virtual    596MB
root [3] test2()
                             test2 start: resident    347MB, virtual    596MB
                           created array: resident    347MB, virtual    596MB
                            filled array: resident    347MB, virtual    596MB
            read another array from file: resident    348MB, virtual    596MB
       prepared a collection for merging: resident    348MB, virtual    596MB
                                  merged: resident    348MB, virtual    596MB
              deleted wrapper collection: resident    348MB, virtual    596MB
                deleted the stored array: resident    348MB, virtual    596MB
                     wrote array to file: resident    348MB, virtual    596MB
                           deleted array: resident    348MB, virtual    596MB
                         did file->write: resident    348MB, virtual    596MB
                         did file->close: resident    348MB, virtual    596MB
                            deleted file: resident    348MB, virtual    596MB
                               test2 end: resident    348MB, virtual    596MB
root [4] test2()
                             test2 start: resident    348MB, virtual    596MB
                           created array: resident    348MB, virtual    596MB
                            filled array: resident    348MB, virtual    596MB
            read another array from file: resident    348MB, virtual    596MB
       prepared a collection for merging: resident    348MB, virtual    596MB
                                  merged: resident    348MB, virtual    596MB
              deleted wrapper collection: resident    348MB, virtual    596MB
                deleted the stored array: resident    348MB, virtual    596MB
                     wrote array to file: resident    348MB, virtual    596MB
TFile**         out.root
 TFile*         out.root
  KEY: TObjArray        array;1 An array of objects
                           deleted array: resident    348MB, virtual    596MB
                         did file->write: resident    348MB, virtual    596MB
                         did file->close: resident    348MB, virtual    596MB
                            deleted file: resident    348MB, virtual    596MB
                               test2 end: resident    348MB, virtual    596MB

And (at least in my case)< if you increase the number of histograms (10,000 here), we can start to see fluctuation (i.e. some memory be released):

root [3] test2()
                             test2 start: resident   1755MB, virtual   2003MB
                           created array: resident   1755MB, virtual   2003MB
                            filled array: resident   1755MB, virtual   2003MB
            read another array from file: resident   2142MB, virtual   2390MB
       prepared a collection for merging: resident   2142MB, virtual   2390MB
                                  merged: resident   2142MB, virtual   2390MB
              deleted wrapper collection: resident   2142MB, virtual   2390MB
                deleted the stored array: resident   1755MB, virtual   2003MB
                     wrote array to file: resident   1755MB, virtual   2003MB
TFile**         out.root
 TFile*         out.root
  KEY: TObjArray        array;1 An array of objects
                           deleted array: resident   1755MB, virtual   2003MB
                         did file->write: resident   1755MB, virtual   2003MB
                         did file->close: resident   1755MB, virtual   2003MB
                            deleted file: resident   1755MB, virtual   2003MB
                               test2 end: resident   1755MB, virtual   2003MB

Indeed, I also doubt it’s a leak, but rather something does not release the memory.

I did a couple more experiments, it seems that the act of writing the array to the file seems to prevent the memory from being released. If we take this:

#include <iostream>
#include <iomanip>
#include <TH1F.h>
#include <TFile.h>
#include <TSystem.h>

void reportMemory(std::string prefix) {
  sleep(2);
  ProcInfo_t pi;
  gSystem->GetProcInfo(&pi);
  std::cout << std::setw(40) << prefix << ": resident " << std::setw(6) << pi.fMemResident / 1000 << "MB, virtual " << std::setw(6) << pi.fMemVirtual / 1000 << "MB" << std::endl;
}

void test4()
{
  reportMemory("test4 start");
  {
    auto file = new TFile("out.root", "UPDATE");
    if (file->IsZombie() || !file->IsOpen() || !file->IsWritable()) {
      std::cout << "failed to open the file" << std::endl;
      return;
    }

    TObjArray* arr = new TObjArray();
    arr->SetName("array");
    arr->SetOwner(false);
    reportMemory("created array");

    // this aims to create histograms worth 1000 * 25000 * 4B = 100MB of memory  
    for (auto i = 0; i < 1000; i++) {
      auto name = std::string("histo") + std::to_string(i);
      TH1F* histo = new TH1F(name.c_str(), name.c_str(), 25000, 0, 10000);
      histo->FillRandom("gaus");
      arr->Add(histo);
    }
    reportMemory("filled array");

    file->WriteObject(arr, arr->GetName(), "Overwrite");
    reportMemory("wrote array to file");

    arr->SetOwner(true);
    delete arr;
    reportMemory("deleted array");

    file->Write();
    reportMemory("did file->write");

    file->Close();
    reportMemory("did file->close");

    delete file;
    reportMemory("deleted file");
  }
  reportMemory("test4 end");
}

int main(int argc, char **argv)
{
   test4();
   return 0;
}

It gives us the following output:

$> root -l
root [0] .x test4.C
                             test4 start: resident    242MB, virtual    405MB
                           created array: resident    242MB, virtual    405MB
                            filled array: resident    345MB, virtual    505MB
                     wrote array to file: resident    360MB, virtual    519MB
                           deleted array: resident    360MB, virtual    519MB
                         did file->write: resident    360MB, virtual    519MB
                         did file->close: resident    360MB, virtual    519MB
                            deleted file: resident    360MB, virtual    519MB
                               test4 end: resident    360MB, virtual    519MB

but if I comment out only the WriteObject and the associated log, we get:

$> root -l
root [0] .x test4.C
                             test4 start: resident    243MB, virtual    405MB
                           created array: resident    247MB, virtual    406MB
                            filled array: resident    349MB, virtual    507MB
                           deleted array: resident    251MB, virtual    408MB
                         did file->write: resident    251MB, virtual    408MB
                         did file->close: resident    251MB, virtual    408MB
                            deleted file: resident    251MB, virtual    408MB
                               test4 end: resident    251MB, virtual    408MB

I am not very proficient in Linux I/O, but perhaps there is some memory mapping done between the array and the file which is not fully “unlinked” once over?

This ‘hoarding’ is part of the overall linux optimization of how the processes acquire and release memory (i.e. avoiding/reducing unnecessary churn and inefficiency). If you need to force the process to release the memory (in absence of other pressure in the system which might have provoke the release earlier) you can use:

#include <malloc.h>
....
malloc_trim(0);

Hi,

Thank you for all the interesting and helpful discussion here.

A few comments from my side:

  1. I see the same behaviour on Mac. Actually it takes a lot more iterations to plateau.
  2. stdlib.h should be preferred to malloc.h
  3. malloc_trim only exists on Linux, not on other platforms such as Mac.

Cheers,
Barth

Thanks! Indeed malloc_trim does help, as can be seen in the log:

                             test5 start: resident    242MB, virtual    406MB
                           created array: resident    244MB, virtual    407MB
                            filled array: resident    346MB, virtual    507MB
                     wrote array to file: resident    360MB, virtual    521MB
                           deleted array: resident    360MB, virtual    521MB
                         did file->write: resident    360MB, virtual    521MB
                         did file->close: resident    360MB, virtual    521MB
                            deleted file: resident    360MB, virtual    521MB
                        malloc_trim done: resident    249MB, virtual    508MB
                               test5 end: resident    249MB, virtual    508MB

It is interesting though that the virtual memory does not decrease so much as the resident.
I understand though that there is nothing that can be improved aside from using malloc_trim in such cases.

Yes, this is out of our control and is really now a question of Linux memory management.

A priori, there should be no real world consequence to this lack of decrease of the virtual memory. Do you have a case where it is indeed an issue (i.e. preventing or slowing things down)?

Decreasing the memory usage here was a part of a larger endeavour to decrease the memory usage of our jobs on grid. This process did not cause any particular problems, but it was just standing out. I assume that the OS would finally take over this heap memory if other processes needed it. So any reasonable way to trim the statistic down is still helpful to avoid questions and discussions.

I was wondering about the virtual memory, because I understood it as “how much memory a process thinks it has”, so I would assume that once it deletes some, it should be reflected in the virtual memory count. That could be my misunderstanding though…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.