Why proof use all my memory (during process)

I’m using ROOT 5.28 with prooflight (but I’ve similar problems in proof). I’m running a stupid selector with ~300 input files in a single TChain for a total of ~40Gb. As you can see in the figure the code is too stupid because in the process I put return kTrue at the begin, so the code is really doing nothig. So why does it use all the memory (~6Gb on a 4 core machine)?

I tried also to put the return kTrue in the correct position, at the end of the Process method, by the behaviour is the same.

The files are locally on the hard disk.


I’ve understood that the problem is dataset dependent. I tried with a different dataset and the TTrees have the same skeleton as in the old dataset.

Now I tried with 287 file for a total of 1.5 Gb and the memory usage is very very low, less than 500 Mb during the process.

I tried also with the old dataset, now with only 3 files for a total of 1 Gb and the problem is still here, in few seconds proof use all my memory.

If you want to try I’ve putted here: precision-turra.mi.infn.it/outpu … 00370.root an example of the problematic ntuples (~300Mb)

Hi,

I had a quick look at your file. Just opening it in a simple ROOT session and doing GetEntries() on the Tree ends up to use 1.5 GB of memory. However, when the tree is deleted and the file closed (what PROOF does during validation and processing) the memory is correctly released.
I think the problem is somewhat related to the tree. I will continue to investigate and let you know.

Gerri

Hi,

The issue stems from the way the file was written. As can be seen in this:root [0] TFile *file0 = TFile::Open("output_skimmed_00370.root"); root [1] file0->Map(); 20101221/213839 At:100 N=146 TFile 20101221/213839 At:246 N=289 TH1F CX = 2.07 20101221/213915 At:535 N=330436238 TTree CX = 3.20 20101221/213921 At:330436773 N=6348 StreamerInfo CX = 3.42 20101221/213921 At:330443121 N=166 KeysList 20101221/213921 At:330443287 N=69 FreeSegments 20101221/213921 At:330443356 N=1 ENDWhere was is missing are the information about baskets. In other the words, in this file all the baskets have been stored with the TTree object. In consequence, whenever you load the TTree you have to load all the data even to just get the number of entries (Hence the huge peaks of memory use).

How did you write those files? With older version of ROOT, you could create this type of file by creating a TTree ‘in memory’ and then open a new file and store the TTree. A priori in v5.28, this is no longer possible.

Once we manage to avoid this unusual type of file layout, the memory peak will disappear.

Cheers,
Philippe.

PS. I was able to re-create the file with the usual layout by simply doing:hadd -f output_skimmed_00370_fixed.root output_skimmed_00370.root Target file: output_skimmed_00370_fixed.root Source file 1: output_skimmed_00370.root Target path: output_skimmed_00370_fixed.root:/ output_skimmed_00370.root tree:PAUReco entries=1234567890and getting:20110202/223316 At:100 N=158 TFile 20110202/223316 At:258 N=289 TH1F CX = 2.07 20110202/223331 At:547 N=267 TBasket CX = 119.84 20110202/223331 At:814 N=25330 TBasket CX = 1.26 20110202/223331 At:26144 N=340 TBasket CX = 94.11 .... 20110202/223350 At:343493364 N=17699 TBasket CX = 1.73 20110202/223350 At:343511063 N=368238 TTree CX = 3.76 20110202/223352 At:343879301 N=172 KeysList 20110202/223352 At:343879473 N=6658 StreamerInfo CX = 3.42 20110202/223352 At:343886131 N=75 FreeSegments 20110202/223352 At:343886206 N=1 END where I estimate the memory peak should now be less than 2Mb …

[quote=“pcanal”]Hi,

The issue stems from the way the file was written. As can be seen in this:root [0] TFile *file0 = TFile::Open("output_skimmed_00370.root"); root [1] file0->Map(); 20101221/213839 At:100 N=146 TFile 20101221/213839 At:246 N=289 TH1F CX = 2.07 20101221/213915 At:535 N=330436238 TTree CX = 3.20 20101221/213921 At:330436773 N=6348 StreamerInfo CX = 3.42 20101221/213921 At:330443121 N=166 KeysList 20101221/213921 At:330443287 N=69 FreeSegments 20101221/213921 At:330443356 N=1 ENDWhere was is missing are the information about baskets. In other the words, in this file all the baskets have been stored with the TTree object. In consequence, whenever you load the TTree you have to load all the data even to just get the number of entries (Hence the huge peaks of memory use).

How did you write those files? With older version of ROOT, you could create this type of file by creating a TTree ‘in memory’ and then open a new file and store the TTree. A priori in v5.28, this is no longer possible.

Once we manage to avoid this unusual type of file layout, the memory peak will disappear.

Cheers,
Philippe.

PS. I was able to re-create the file with the usual layout by simply doing:hadd -f output_skimmed_00370_fixed.root output_skimmed_00370.root Target file: output_skimmed_00370_fixed.root Source file 1: output_skimmed_00370.root Target path: output_skimmed_00370_fixed.root:/ output_skimmed_00370.root tree:PAUReco entries=1234567890and getting:20110202/223316 At:100 N=158 TFile 20110202/223316 At:258 N=289 TH1F CX = 2.07 20110202/223331 At:547 N=267 TBasket CX = 119.84 20110202/223331 At:814 N=25330 TBasket CX = 1.26 20110202/223331 At:26144 N=340 TBasket CX = 94.11 .... 20110202/223350 At:343493364 N=17699 TBasket CX = 1.73 20110202/223350 At:343511063 N=368238 TTree CX = 3.76 20110202/223352 At:343879301 N=172 KeysList 20110202/223352 At:343879473 N=6658 StreamerInfo CX = 3.42 20110202/223352 At:343886131 N=75 FreeSegments 20110202/223352 At:343886206 N=1 END where I estimate the memory peak should now be less than 2Mb …[/quote]

I don’t know which version I used, because it was some time ago, and the admin changed the version in at some time, maybe 5.21. I create the file in this way inside a proof session. During process:

        if (first_event)
        {
        output_tree = fChain->CloneTree(0);
        fChain->GetTree()->CopyAddresses(output_tree);
        }
        if (pass_selection)
        {
           output_tree->Fill();
        }

and at the end, in Terminate with:

TFile f(output_filename, "RECREATE");
output_tree = static_cast<TTree*>(fOutput->FindObject("PAUReco"));
output_tree->Write();

Thank you very much

Hi,

[quote]I don’t know which version I used, because it was some time ago, and the admin changed the version in at some time, maybe 5.21. I create the file in this way inside a proof session. During process:[/quote]With version of ROOT older than 5.20 (So I am guessing that you were using 5.18), your code would indeed lead to that kind of ROOT files. If you use v5.28 to write the TTree part of your file, the problem should be gone.

Cheers,
Philippe.

PS. output_tree = fChain->CloneTree(0); fChain->GetTree()->CopyAddresses(output_tree);The 2nd line is/should be redundant as it is executed in the CloneTree.