Files, Trees and Memory

Dear Rooters

I have some questions regarding the handling of (temporary) large trees.
Assume that my computer has 512 MB RAM and a tree has a size of 1 GB.

As far as I understand when filling a tree the tree (buffer) is written to the currently open file
whenever the treebuffer reaches fMaxVirtualSize.

1, What happens when I fill a tree but no TFile is open? Where will the tree be saved?

2, Does a temporary tree increase the final size of TFile when it is temporarily stored in the file
only, but not written permanently to TFile with tree->Write()?

3, Since I am storing my trees in different subdirectories of TFile:
What happens when TFile reaches its maximum file size?

Best regards
Christian

[quote]Assume that my computer has 512 MB RAM and a tree has a size of 1 GB.

As far as I understand when filling a tree the tree (buffer) is written to the currently open file
whenever the treebuffer reaches fMaxVirtualSize. [/quote]
NO. Each branch has its own buffer with a buffersize specified in TTree::Branch. When the buffer is full, it is written to the file and refilled again.

In this case, when a branch buffer is full, it is kept in memory and a new buffer is created.

[quote]2, Does a temporary tree increase the final size of TFile when it is temporarily stored in the file
only, but not written permanently to TFile with tree->Write()? [/quote]As said above, branch buffers are written to the file when they are full.
TTree::Write writes the tree header together with all branch buffers still in memory.
Just run $ROOTSYS/test/Event to see the behaviour.

[quote]3, Since I am storing my trees in different subdirectories of TFile:
What happens when TFile reaches its maximum file size? [/quote]
A new file is automatically created.
see Users Guide and
root.cern.ch/root/htmldoc//TTree … ChangeFile
root.cern.ch/root/htmldoc///TTre … axTreeSize

Rene

Dear Rene

Thank you for your fast reply and the explanations.
However, I have still questions to my issues:

ad 1, When I have 512 MB RAM and the tree will be filled to a size of 1 GB, then the branch buffer
cannot be kept in memory (or if the tree size is larger than virtual memory). What happens?

ad 2, When the branch buffers are written to file temporarily only, and I open the file with TBrowser
later, then the tree is no longer available since I did not write it permanently to TFile.
Did it increase the file size on the hard disk? see also my earlier question:
root.cern.ch/phpBB2/viewtopic.ph … 64f3d13b10

ad 3, The document for TTree::ChangeFile() says: The file should not contain sub-directories.
However, my trees are stored in subdirectories. What happens in this case?

Thank you
Christian

[quote]ad 1, When I have 512 MB RAM and the tree will be filled to a size of 1 GB, then the branch buffer
cannot be kept in memory (or if the tree size is larger than virtual memory). What happens?
[/quote]
Just try. You will get an error message. The info from TTree::Fill will not be stored in the buffers since there is no space to store it.

[quote]ad 2, When the branch buffers are written to file temporarily only, and I open the file with TBrowser
later, then the tree is no longer available since I did not write it permanently to TFile.
Did it increase the file size on the hard disk? see also my earlier question:
root.cern.ch/phpBB2/viewtopic.ph … 64f3d13b10
[/quote]

As the doc says, this is a restriction. It is up to you to detect when
you are approaching the max Tree size and implement the file switch over.

Rene
What do you mean by “written to file temporarily only”. It it either written on disk or not.

[quote]ad 3, The document for TTree::ChangeFile() says: The file should not contain sub-directories.
However, my trees are stored in subdirectories. What happens in this case?
[/quote]

Dear Rene

Thank you for the clarification.

What do you mean by “written to file temporarily only”: This is my most important question.

During my calculation I create sometimes hundreds of temporary trees, which I pass from one
method to the next. However, when I do not write them explicitly to disk with "tree->Write(),
they should not be stored in TFile permanently.
Nevertheless, as far as I understand you, the branchbuffers of these trees will be written
to file when they are full, causing the file size to grow. Is this correct?

Best regards
Christian

Dear Rene

Sorry to bother you again.

Meanwhile I have tested my program with large data:

a, I am writing the large temporary trees to the root file using tree->Write().
The file size is 84.8 MB and contains 40 relatively small trees and 60 large trees.

b, I am disabling tree->Write(), so that the temporary trees are not stored permanently
in the root file. Nevertheless, the file size is 82.4 MB although only the 40 small
trees are stored premanently in the file resulting in a theoretical size of about 10 MB.

This means that the temporarily stored branch buffers of the large trees increase the
final size of the root file even though the trees are not stored. The only option
for me seems to be to create a second (temporary) root file to store the temporary
trees when I want to keep the original file size small. Do you know another option?

Best regards
Christian

Christian,

Let me repeat again;
There is no such thing as "buffers written temporarily to a file"
Either you write the buffers or not. It is like eating or no eating.

[quote]a, I am writing the large temporary trees to the root file using tree->Write().
The file size is 84.8 MB and contains 40 relatively small trees and 60 large trees.
[/quote]
If the file size is only 84.8 Mb your 60 large trees must be very small trees.

[quote]b, I am disabling tree->Write(), so that the temporary trees are not stored permanently
in the root file. Nevertheless, the file size is 82.4 MB although only the 40 small
trees are stored premanently in the file resulting in a theoretical size of about 10 MB.
[/quote]
Please read carefully my previous posting. Branch buffers are written
to the file when they are full. At the end of your job, you write
the tree header that contains the general book-keeping information
and also the current buffers in memory.
I suggest to look at the printout of

mytree.Print(); //to see the number of buffers written myfile.Map(); to see the file layout
Rene

Dear Rene

Now I understand (at least I hope so):
Once TBasket is full it is written to my file. When I do tree->Write() the tree headers
are also written to the file, which allow me to access the tree later.
If I do not tree->Write() then I cannot access the tree but all TBaskets are still
saved in my file. Is this correct?

With my new understanding I thought that when I do not tree->Write() then the
baskets belonging to the tree would be automatically deleted when e.g. closing the file.
Since these TBaskets are dead data: Is there a way to delete all TBaskets which do not
belong to any tree stored in the file?

BTW, since I have too many baskets I cannot compare my two files. Is there a way
to save the output of file.Map() to a text file?

P.S.: Compared to HEP data my trees are surely pretty small, for biology data they
are already pretty large and the sizes are increasing at a steady rate.
Furthermore, I am still testing and have not used the largest datasets yet.

Best regards
Christian

[quote]If I do not tree->Write() then I cannot access the tree but all TBaskets are still
saved in my file. Is this correct? [/quote]

yes

[quote]With my new understanding I thought that when I do not tree->Write() then the
baskets belonging to the tree would be automatically deleted when e.g. closing the file.
Since these TBaskets are dead data: Is there a way to delete all TBaskets which do not
belong to any tree stored in the file? [/quote]

But what is the point in filling the Tree in this case?

[quote]BTW, since I have too many baskets I cannot compare my two files. Is there a way
to save the output of file.Map() to a text file? [/quote]

root > TFile f(“myfile.root”);
root > f.Map(); > f.map

f.map contains the output of f.Map()

Rene

Dear Rene

Currently, the largest experiments consist of about 700 data tables with each about 1.4 million rows. A certain algorithm needs to have access to all these data at once. The R package which we use keeps all these data in memory as a single table and creates two more tables during the calculation to keep the interim data of the calculation, so we need to run it on a 64 bit Opteron with 16 GB RAM and we use almost all of the RAM.
Instead of keeping all data in memory as tables, my implementation stores the interim data as trees which have the advantages that only the currently used baskets are put in memory. Since these trees hold the interim data only, I do not need them afterwards.
This reminds me that the solution is to write the trees to file and then delete the trees from the file again. However, the better way seems to be to create a temporary file for these trees.

Thank you for your help.

Best regards
Christian