File size with a CloneTree

Hi,

I d like to modify a tree and overwrite it in the same file and this several time. I have an issue with the file size. To summarise the problem I got to this piece of code which I would assume do nothing, but unfortunately it increases the file size.

void example(void) {
  string filename   = "example.root";
  string treename   = "tagsDumper/trees/gjet_13TeV_TTHHadronicTag";
  string writingDir = "tagsDumper/trees/";
  
  TFile *file = new TFile( filename.c_str(), "update" );
  file->cd(writingDir.c_str());

  cout << " ---- just opening the file" << endl;
  gDirectory->ls();
  cout << " ---- " << endl;
  
  TTree *tree = (TTree*) file->Get( treename.c_str()  );
  TTree *newtree = tree->CloneTree();
  newtree->SetName("toto-mon-toto-name");
  newtree->SetTitle("toto-mon-toto-title");

  //newtree->Write("",TObject::kOverwrite);
  file->Close();
}

if I do that several time the file size increase each time by the size of the tree I am cloning.
When I check what is inside the file though, I don’t see anything new and the cloned tree is not there as expected. The original tree size is also not changed so I can not find anything new in the file.

What should I do to avoid that ? I tried to delete the cloned tree (new tree) but does not help.

I d like to avoid posting the example.root file which are private CMS data but I could if needed.

THanks,

Fabrice

Try to end your macro with:

newtree->Write();
file->Delete((treename + ";*").c_str());
delete file; // automatically deals with "tree" and "newtree"

Thanks Wile, so I tried your suggestion but it does not change, the file keeps growing after each iteration.

I also tried:

file->Delete( "toto-mon-toto-name;*");
//  file->Delete((treename + ";*").c_str());
delete file;

but does not help either.

Here is a reproducible example, create a simple random tree, cloning it and not saving I see the file size growing while the file content remains identical.

#include <TH1.h>
#include <TTree.h>
#include <TFile.h>
#include <TRandom.h>

#include <iostream>
#include <string>
using namespace std;

string treename   = "aRandomTree";
string filename   = "example.root";
string writingDir = "tagsDumper/trees/";

void create_file(void) {
  TFile *file = new TFile( filename.c_str(), "recreate" );

  TTree *mytree = new TTree( treename.c_str(), treename.c_str() );
  Float_t px,py,pz;
  mytree->Branch("px",&px,"px/F");
  mytree->Branch("py",&py,"py/F");
  mytree->Branch("pz",&pz,"pz/F");

  TRandom gen;
  int nentries = 1000000;
  for( int ie = 0; ie < nentries; ie++ ) {
    px = gen.Uniform(0,1000);
    py = gen.Uniform(0,1000);
    pz = gen.Uniform(0,1000);
    mytree->Fill();
  }
  mytree->Write();
  file->Close();
}


void cloner(void) {
  TFile *file = new TFile( filename.c_str(), "update" );
  //  file->cd(writingDir.c_str());

  cout << "     ---- content of file" << endl;
  gDirectory->ls();
  
  TTree *tree = (TTree*) file->Get( treename.c_str()  );
  TTree *newtree = tree->CloneTree();
  newtree->SetName("toto-mon-toto-name");
  newtree->SetTitle("toto-mon-toto-title");

  //newtree->Write("",TObject::kOverwrite);
  // file->Delete( "toto-mon-toto-name;*");
  // file->Delete((treename + ";*").c_str());

  file->Close();
   //   delete file;
}

void example(void) {
  cout << " ======= creating file ======== " << endl;
  create_file();
  cout << endl;
  
  int nClone = 4;
  for( int itclone = 0 ; itclone < nClone; itclone++ ) {
    TFile *f = TFile::Open( filename.c_str(),"read" );
    cout << " file - size, iteration " << itclone <<  " : " << f->GetSize() << " bytes." << endl; 
    f->Close();
    delete f;
    cloner();
    cout << "      ==> cloning "
	 << endl << endl << endl;
  }
	
}

and here is the output of root -l example.C+

======= creating file ========

file - size, iteration 0 : 10856292 bytes.
---- content of file
TFile** example.root
TFile* example.root
KEY: TTree aRandomTree;1 aRandomTree
==> cloning

file - size, iteration 1 : 21677294 bytes.
---- content of file
TFile** example.root
TFile* example.root
KEY: TTree aRandomTree;1 aRandomTree
==> cloning

file - size, iteration 2 : 32498306 bytes.
---- content of file
TFile** example.root
TFile* example.root
KEY: TTree aRandomTree;1 aRandomTree
==> cloning

file - size, iteration 3 : 43319328 bytes.
---- content of file
TFile** example.root
TFile* example.root
KEY: TTree aRandomTree;1 aRandomTree
==> cloning

TTree *newtree = tree->CloneTree();

This steps copies all the basket corresponding the input TTree to the output file. The only way to then free the space they occupied inside the file them is to call TTree::Delete:

newtree->Delete("all");

Note that this will not reduce the file of the disk but just mark the space as re-usable for future writes.

Cheers,
Philippe.

What you say is that one MUST execute (which indeed works):

TFile *file = new TFile( filename.c_str(), "update" );
TTree *t; file->GetObject( "toto-mon-toto-name", t );
if (t) t->Delete("all");

However, the TFile::Delete method description says “foo;* delete all cycles of foo on disk and also from memory” and then “T*;* delete all objects from memory and file and all subdirectories”.
Indeed, the TTree disappears from the TFile after doing:

TFile *file = new TFile( filename.c_str(), "update" );
file->Delete("toto-mon-toto-name;*");

It seems, however, that the baskets on disk are not freed (i.e. they are NOT marked as available for future writes). How can one “free” all unused TFile baskets?

@pcanal, thanks a lot Philippe, I tried to add this line right before closing the file so the upper code now reads:

  TTree *tree    = (TTree*) file->Get( treename.c_str()  );
  TTree *newtree = tree->CloneTree();
  
  //  newtree->Write("",TObject::kOverwrite);
  // file->Delete((treename + ";*").c_str());
  newtree->Delete("all");
  file->Close();

but this is crashing (malloc error). Should I do that somewhere else ?
I also listed the file memory map and indeed this is related to the baskets of the clone which are stored in the file.

I am kind of stuck here, my point is eventually to update a weight in the tree, and I have to do that about 20 times, the file size gets from 220M to 4.5G and I can not do the final step.

so this is when closing the file that the TTree destructor is crashing…

@pcanal , the crash is due to the fact that the Clone is still connected to the original tree.

So using the Phillpe’s Delete function( with “all”), I found out a way to do precisely what I want, i.e. modifying a branch several time and overwriting the original tree. The cost in size is “only” a factor 2, since this oversize is then re-used at each iteration.

here is the snippet of code, that changes the value of px in the original tree and re-write. Instead of deleting the Clone, I delete the original tree and then write clone tree.


void cloner(void) {
  TFile *file = new TFile( filename.c_str(), "update" );
  
  cout << "     ---- content of file" << endl;
  gDirectory->ls();

  TTree *tree    = (TTree*) file->Get( treename.c_str()  );
  tree->SetBranchStatus("px",0);
  
  TTree *newtree = tree->CloneTree(0);
  Float_t px;
  newtree->Branch( "px",&px,"px/F");
  
  for( unsigned ievt = 0 ; ievt < tree->GetEntries(); ievt++ ) {
    tree->GetEntry(ievt);
    px = 1;
    newtree->Fill();
  }  
  tree   ->Delete("all");
  newtree->Write("",TObject::kOverwrite);
  file->Close();

}

Thanks a lot to Philippe and Wile

Indeed, TFile::Delete does not know about the object type and does not know that there could be more to free in the file. In addition, the basket are ‘shared’ between all the cycle and thus in almost all cases, when the TTree object is asked to be removed from the disk, the basket are intended to stay (for the other cycles).

This is why one must use TTree::Delete to remove the baskets. Once the baskets have been orphanated (the corresponding TTree has been deleted), there is no way (that has been already coded) to clean them up (such a way would requires to scan the whole file directory structure to see if the basket might belong to any TTree).

In general, any single object of any class can have multiple “cycles” present / stored in a TFile.
Let’s take some “TH1” histogram, for example, and assume there are several “cycles” stored.
Should I expect that SomeTFile->Delete("SomeTH1;*") will properly delete all its cycles and “free” all the corresponding data records or not?
I don’t even dare to ask what would happen if these histograms were present / stored in some TDirectoryFile and I executed SomeTFile->Delete("SomeTDirectoryFile*;*") (well, I don’t even dare to dare to think about multiple TDirectoryFile “cycles”, of course).
Moreover, one can also save some TFolder hierarchy to a TFile. Will SomeTFile->Delete("SomeTFolder*;*") properly (recursively) delete all its cycles, objects that it keeps and “free” all the corresponding data records or not?

In any case, could you please add clear notes in the TFile::Delete method description:

  1. give a full explicit list of (known) classes / objects for which this method does not “free” data records for future reuse (which actually “orphans” all related data records),

  2. for all classes / objects mentioned in point 1. above, give explicit “instructions” how to properly delete them from a TFile,

  3. give an explicit warning that there is no way to “free” data records once they are “orphaned”, so that users pay attention to the instructions in point 2. above.

BTW. You seem to suggest that the TDirectoryFile::Purge will also leave “orphaned” data records (in some cases only?).

Last but not least, does the “rootrm” utility properly “free” data records?

it will properly delete all its cycle but it will not free any ‘dependent’ data record unless it is a TDirectoryFile.

BTW. You seem to suggest that the TDirectoryFile::Purge1 will also leave “orphaned” data records (in some cases only?).

It should not in the normal case. In the case of a TTree the dependent data record of the ‘removed old cycle’ are also dependent of the newest cycle. [The user can make this untrue if they re-use the concept of cycle for something else than ‘backing up’ the TTree meta-data]

SomeTFile->Delete(“SomeTFolder*;*”) properly (recursively) delete all its cycles, objects that it keeps and “free” all the corresponding data records or not?

If i recall correctly, TFolder are stored in a single data record.

Last but not least, does the “rootrm” utility properly “free” data records?

It does not. (It currently use ‘just’ TFile::Delete).

In any case, could you please add clear notes in the TFile::Delete1 method description:

Indeed this is needed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.