Large memory usage when filling TTree

BirgitZatschler · June 15, 2022, 3:53pm

Dear ROOT-experts,

my collaboration has a Geant4 application that writes its output to a ROOT TFile. We have 24 detectors and create a TTree for each of them. Each TTree contains several branches, 4 of them are of type char and 30 are of type double. I wrote a minimal example to show this structure and fill the TTrees. It also outputs the res and virt memory. The plots below only show the res memory monitored with top.
FillTree.C (2.5 KB)

If we use the default settings for AutoFlush and AutoSave (comment line 90-91), creating 5e6 events, the memory stays constant at ~300 MB for the first 3.5M events and then suddenly jumps to 5 GB and then to almost 8 GB.

mem_24TTree_5e6_default_AutoFlush.txt (5.5 KB)

We tried to decrease this high memory usage by adjusting the AutoFlush to 5 MB, and AutoSave to 50 MB. With that the memory jump is happing much earlier, but only goes up to 6 GB for 2e6 events.

mem_24TTree_2e6_5MB_AutoFlush.txt (3.8 KB)

On the other hand, I also created only 1 TTree in the minimal example (line 26) while increasing the number of events by a factor of 50 (line 27, 1e8 events). So this would produce about a factor of 2 more data filled in 1 Tree compared to the previous 2e6 events in 24 TTrees. Indeed, the ROOT TFile size increased by more than a factor of 2 (347MB → 928 MB). Nevertheless, the memory consumption stayed constant at ~500MB.

mem_1TTree_1e8_5MB_AutoFlush.txt (56.7 KB)

Is there something we are doing wrong when filling 24 TTrees compared to just 1 TTree? Is there an explanation for this large memory usage?

I have run the attached minimal example with:
ROOT Version: 6.26.04 and 6.24.06
Platform: Ubuntu 20.04.4
Compiler: gcc 9.4.0

I compiled ROOT with the following flags:
-DCMAKE_BUILD_TYPE=RelWithDebInfo
-Dmysql=OFF
-Dodbc=OFF
-Doracle=OFF
-Dpgsql=OFF
-Dpyroot=ON
-Dxrootd=OFF
-Dpython3=ON
-Dunuran=OFF
-Dminuit2=ON
-Droofit=ON
-Dfftw3=ON
-Dgsl=ON
-Ddavix=OFF
-Dfortran=ON
-Dbuiltin_fftw3=ON
-Dbuiltin_gsl=ON
-Dbuiltin_glew=ON
-Dbuiltin_cfitsio=ON
-Dbuiltin_vdt=ON

Edit: Minimal example and plots attached.

eguiraud · June 15, 2022, 4:03pm

Hi @BirgitZatschler ,

and welcome to the ROOT forum! I just manually raised your trust level, you should be able to post attachments now.

As far as I know the memory usage should not have these jumps, once all branches of all trees are humming along writing their data the memory usage should be somewhat stable. @pcanal is the expert here, let’s ping him.

Cheers,
Enrico

BirgitZatschler · June 15, 2022, 4:06pm

Hi @eguiraud ,

thanks a lot. I have updated the initial post and attached the minimal example, the plots and the memory in text files (res and virt).

Cheers,
Birgit

pcanal · June 15, 2022, 4:20pm

Hi Birgit,

What is your compression ratio? For example copy/paste here the output of tree->Print(); for one of the TTree in the worst case scenario.

Cheers,
Philippe.

pcanal · June 15, 2022, 4:51pm

Note that the setting for the autoflush is expressed (when negative) in compressed data size and upon reaching that value a first time, the TTree adjust the in memory buffer to fit enough data to produce that amount of compressed data.

For example, with the default value (32MB) if your average compression ratio is, let’s say, 3, then the TTree will allocate 32*3 = 96MB and will essentially use constant memory after that.

From the mem_1TTree_1e8_TMB_Autoflush histogram we can infer that it takes the TTree about .25 second to reach the 5MB of compressed data. From the mem_24TTree_2e6_TMB_Autoflush histogram, we see the ramp up starts around .9 second and progressive increase. I am guessing this is due to the various TTree having different effective compression ratio and thus allocating their buffer at a different point in time.

The base memory use in the 1 TTree case seems to be around 0.2 GB, so the tree uses (likely) around 0.3 GB of memory. So for 24 tree, the memory usage should be 0.3 GB * 24 = 7.2 GB, since it only get to 6GB I guess the average size per TTree is actually around .25 GB. From the size I mentioned I am guesstimating that the compression ration is around 0.25 / 0.005 = 50 (which is pretty high). If we take the default value then 32 MB * 50 * 24 = 38 GB which is much more than the original 8GB, so I might be missing something.

For the timing though, getting to 5MB of compressed data for all 24 TTree takes .9, so getting to 32 ‘should’ take 32/5 * .9 = 5.76 second which is pretty much what we see on the first graph.

In conclusion there is 2 majors issues at play here. One is the high compression ratio and the other is the ‘large’ number of TTree.

Let’s start with the high compression ratio. In the example I see:

  for(int i=0; i<30; i++) theTree->Branch("Energy", &energy, "Energy/D");
  for(int i=0; i<4; i++) theTree->Branch("Volume", volume, "Volume/C");

(In this example, the same data (from the 2 variables energy and volume are stored in all the branches, I am guessing that this is an artifact of the simplification of the example. However errors on setting the branches could lead to duplicates and ‘random’ data being stored, you should double check that the data you intent is actually stored in the file).

The 2nd set of branches stores a string. Is the volume name changing every single entry? How many distinct value is there? If there is only a few volumes, the same strings will be stored many times in the basket leading to good compression and could explain the high compression ratio (in the deprecated example, there is only one string so the compression ratio will be very high).

To reduce both the data size and the memory usage (but make usage a little harder) is to store an index (eg an integer value) to the volume rather than the volume name.

The second issue/question is, in your (use) case how does having 24 TTree helps-compared-to / is-better-than a single TTree with 24 times more branches?

Cheers,
Philippe.

BirgitZatschler · June 15, 2022, 8:40pm

Hi @pcanal ,

the compression ratio is as follows.
24TTree, 5e6 events, default AutoFlush:

******************************************************************************
*Tree    :detector0 : G4 data for detector0                                  *
*Entries :  5000000 : Total =      1383278855 bytes  File  Size =   40326533 *
*        :          : Tree compression factor =  34.43                       *
******************************************************************************
*Br    0 :Energy    : Energy/D                                               *
*Entries :  5000000 : Total  Size=   40088615 bytes  File Size  =     323902 *
*Baskets :      901 : Basket Size=    3153408 bytes  Compression= 123.71     *
*............................................................................*
*Br   33 :Volume    : Volume/C                                               *
*Entries :  5000000 : Total  Size=   45154963 bytes  File Size  =    7609952 *
*Baskets :     1459 : Basket Size=   25600000 bytes  Compression=   5.93     *
*............................................................................*

24TTree, 2e6 events, 5MB AutoFlush:

******************************************************************************
*Tree    :detector0 : G4 data for detector0                                  *
*Entries :  2000000 : Total =       552571727 bytes  File  Size =   15071192 *
*        :          : Tree compression factor =  36.72                       *
******************************************************************************
*Br    0 :Energy    : Energy/D                                               *
*Entries :  2000000 : Total  Size=   16015507 bytes  File Size  =     104321 *
*Baskets :      155 : Basket Size=    3153408 bytes  Compression= 153.49     *
*............................................................................*
*Br   33 :Volume    : Volume/C                                               *
*Entries :  2000000 : Total  Size=   18026491 bytes  File Size  =    2978768 *
*Baskets :      247 : Basket Size=    8340480 bytes  Compression=   6.05     *
*............................................................................*

1TTree, 1e8 events, 5MB AutoFlush:

******************************************************************************
*Tree    :detector0 : G4 data for detector0                                  *
*Entries : 100000000 : Total =     27601605583 bytes  File  Size =  715748091 *
*        :          : Tree compression factor =  38.57                       *
******************************************************************************
*Br    0 :Energy    : Energy/D                                               *
*Entries :100000000 : Total  Size=  800047651 bytes  File Size  =    4323216 *
*Baskets :      483 : Basket Size=    3153408 bytes  Compression= 185.06     *
*............................................................................*
*Br    1 :Energy    : Energy/D                                               *
*Entries :100000000 : Total  Size=  800047651 bytes  File Size  =    4323216 *
*Baskets :      483 : Basket Size=    3153408 bytes  Compression= 185.06     *
*............................................................................*
*Br   33 :Volume    : Volume/C                                               *
*Entries :100000000 : Total  Size=  900043875 bytes  File Size  =  146492636 *
*Baskets :      411 : Basket Size=    8340480 bytes  Compression=   6.14     *
*............................................................................*

I truncated the output, the values are almost the same for the branches with the same type and they are the almost same for each tree. Branches with different types have very different compression ratios.

If I now try to follow your calculation with those values, the default AutoFlush gives a compression factor of 34.43 multiplied with 32 MB = 1102 MB, i.e. for 24 trees it is 26 GB? The AutoFlush(5MB) gives 36.72 * 5MB = 184 MB for each of the 24 trees, i.e. in total 4.4 GB. For the 1 TTree case it’s 38.57 * 5MB = 193 MB. Is that right?

I’m unsure concerning your statement of the base memory per tree. To me it looks like the memory does not significantly increases when creating the tree. Before and after creating the 24 trees there is only an increase of something like 4 MB in res memory.

I also tried to use a different compression algorithm and compression level (ROOT::CompressionSettings(ROOT::kLZMA, 1)), which was consuming 1.5 GB of memory for a long time, but after 5e7 events it increased within seconds to 16 GB and the process was killed.

Indeed, for the simplification of this example I just created the branches in loops. The data written to the ROOT file looks as expected, the same number and the same string is stored as intended. In our Geant4 application those branches are not duplicated. The double type branch represents the branches for the x,y,z position, the energy deposition, the event number, etc. The char type branch represents the branches for the volume name, the particle name and the name of the physical process. There are only a few different names for those, which supports your suggestion concerning the high compression ratios.
The compression ratio in a real example from our Geant4 application e.g. for a detector tree it is 19.11 with the individual branches varying between 3 and 170. But we also storing other information in tree, such as the details of the primary particle, which has only a compression factor of 3.

I also had a look if the issue is happening when there are no branches of type char, i.e. all branches are of type double. After x events the memory jumps from 300 MB to 2.5 GB after 2.2M events and to 4.9 GB after 2.7M events when using the AutoFlush(5MB). So this doesn’t really help.

Concerning your question why having 24 TTrees is better, that was a historial choice and people don’t want to change it after using this scheme for many years.

BirgitZatschler · June 16, 2022, 3:26pm

I thought a bit more about this statement regarding the compressed data size:

This would mean that one needs to know the compression beforehand to estimate an appropriate value for the autoflush, right? I guess then it is more useful to use a positive value and calculate how much memory a tree needs for a certain number of entries. Following this approach, a made some tests.

I can calculate the number of bytes a single tree occupies in raw memory in my simple ROOT example. It would be 276 bytes per event. For 24 trees, that’s 276 bytes * 24 = 6.6 kB. If I use the SetAutoFlush with a positive value 100k, I see an increase in memory after 100k events of about 6.6 kB * 100k = 660 MB. After 200k events another increase of the same size. Then the memory stays stable. I did the same for 200k to see if it is consistent. The same happens there with all values twice the size.

SetAutoFlush(100000)

mem_24TTree_2e6_AutoFlush100k.txt (3.8 KB)

SetAutoFlush(200000)

mem_24TTree_2e6_AutoFlush200k.txt (3.8 KB)

I would have expected naively, that the memory is cleared after one flush, but instead it is increasing by this value. Then I’m also confused why the same memory jump is observed after 2 * value of AutoFlush. Finally, I wonder why when the trees are deleted it looks like one of those memory jumps is cleared, but the other one not.

Addressing the question:

I followed this proposal and with 24 times more branches I observe the exact same memory jumps.

Axel · June 20, 2022, 1:08pm

Hi Birgit,
@pcanal will be gone for a long trip - how urgently do you need an answer here, can this wait a couple of weeks? If it cannot wait for him I will see what I can do to help.
Cheers, Axel

BirgitZatschler · June 22, 2022, 8:30pm

Hi @Axel ,

I was finally able to calculate the memory usage per TTree for our Geant4 application. Now I’m using the SetAutoFlush() with a positive value. With that I am able to keep the memory usage on a stable level below 4 GB. On my laptop there is no significant increase in runtime even though the flush to the file is happening more often than with the default settings. I still need to test this on a HPC to be sure.
Anyway, I don’t need any urgent answer as I can move on with the setting as described. I am, however, curious what is happening behind the scenes and I would be really happy if @pcanal can give us some more insight to it if he is back.

I wonder if the documentation could be improved a bit. In TTree::SetAutoFlush for autof < 0 “When filling the Tree the branch buffers will be flushed to disk when more than autof bytes have been written to the file.” it could made clear that this is expressed in compressed data size. In TTree::Fill I think the statement “The committed baskets are then immediately removed from memory.” is misleading. I guess the memory is still reserved and that’s why the memory is not actually freed.

Cheers,
Birgit

system · July 6, 2022, 8:30pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.