Splitting Tree in multiple files

Hi all,

I wanted to split a TTree in a root file in different files according to a certain selection criteria. I know how to do it with CopyTree and specifying the cut, but that means looping once for each file I want to split the TTree to. To give an idea, my tree has 1100M entries and I want to split it in 6000 files according to a certain criteria. That means it would take ages to perform 6000 CopyTrees on a 1100M entry tree. I was hoping there could be a way to do the splitting in just one loop.

I have tried a very simple test case but it doesn’t seem to work:

[code]int test() {
TFile* origin = new TFile(“origin.root”);
TFile* dest = new TFile(“dest.root”, “RECREATE”);

TTree* tree = (TTree*) origin->Get(“Pi0-Tuple”);
TTree* newtree = tree->CloneTree(0);

float m_mass;
tree->SetBranchAddress( “m12” , &m_mass );

Int_t bsize = 64000;

struct TUPLE {
float mass;
} mystruct;

struct mystruct { float mass ;};

TBranch *branch1 = newtree->Branch(“Test”, &mystruct, “m12/F”);
branch1->SetFile(“upMass.root”);
TBranch *branch2 = newtree->Branch(“Test2”, &mystruct, “m12/F”);
branch2->SetFile(“downMass.root”);

for (i=0; i<100; i++){
tree->GetEntry(i);
if (m_mass>200.0) {
mystruct.mass = m_mass;
branch1->Fill();
branch1->Write();
} else {
mystruct.mass = m_mass;
branch2->Fill();
branch2->Write();
}
}
}[/code]

In fact I don’t know if it is the way to go… Any hints?

Thanks a lot,
Albert

Hi,

The snippet code in your post, is creating one tree whose branch are spread over several files, reading one entry of the TTree means reading for ALL those files are the same time, is that really what you want? It seems that the following code is what you are looking for:[code]int test() {
TFile* origin = new TFile(“origin.root”);
TTree *tree; origin->GetObject(“Pi0-Tuple”,tree);

TFile* downMassFile = new TFile"(downMass.root", “RECREATE”);
TTree *downMass = tree->CloneTree(0);

TFile* upMassFile = new TFile"(upMass.root", “RECREATE”);
TTree *upMass = tree->CloneTree(0);

float m_mass;
tree->SetBranchAddress( “m12” , &m_mass );

for (i=0; i<100; i++){
tree->GetEntry(i);
if (m_mass>200.0) {
upMass->Fill()
} else {
downMass->Fill();
}
}
upMassFile->Write();
downMassFile->Write();
}[/code]

Cheers,
Philippe.

Yes, that is what I meant. Thanks, now I understand how to do it, the problem is that you cannot have 6000 open files at the same time:

SysError in <TFile::TFile>: file 771_3324.root can not be opened (Too many open files)
so I will need to be a little more clever to overcome that.

Cheers,
Albert

Hi,

I have “overcome” my problem by doing the following:

[code]#include “TFile.h”
#include “TTree.h”
#include

int cutter(TString filename) {
TFile* origin = new TFile(filename);
TTree* tree = (TTree*) origin->Get(“Pi0-Tuple”);

int cell1, cell2;
tree->SetBranchAddress(“ind1”, &cell1);
tree->SetBranchAddress(“ind2”, &cell2);

int n = tree->GetEntries();
for (int i=0; i<n; i++){
std::cout << i << std::endl;
tree->GetEntry(i);
if (0) {
continue;
}
else if (cell1==1368 || cell2==1368 || cell1==2727 || cell2==2727)
{
TFile* newfile = new TFile(“1368_2727.root”, “UPDATE”);
TTree* newtree = (TTree*) newfile->Get(“Pi0-Tuple”);
if (!newtree) newtree = tree->CloneTree(0);
newtree->Fill();
newtree->Write();
newfile->Close();
}
else if (cell1==1910 || cell2==1910 || cell1==2185 || cell2==2185)
{
TFile* newfile = new TFile(“1910_2185.root”, “UPDATE”);
TTree* newtree = (TTree*) newfile->Get(“Pi0-Tuple”);
if (!newtree) newtree = tree->CloneTree(0);
newtree->Fill();
newtree->Write();
newfile->Close();
}
else if (cell1==2735 || cell2==2735 || cell1==1360 || cell2==1360)
{
TFile* newfile = new TFile(“2735_1360.root”, “UPDATE”);
TTree* newtree = (TTree*) newfile->Get(“Pi0-Tuple”);
if (!newtree) newtree = tree->CloneTree(0);
newtree->Fill();
newtree->Write();
newfile->Close();
}
else
{
TFile* newfile = new TFile(“2736_1359.root”, “UPDATE”);
TTree* newtree = (TTree*) newfile->Get(“Pi0-Tuple”);
if (!newtree) newtree = tree->CloneTree(0);
newtree->Fill();
newtree->Write();
newfile->Close();
}
}
return 0;
}
[/code]

But after running over 74k entries I find that each root file has more than one “Pi0-Tuple” with the form “Pi0-Tuple;XX”, and I don’t understand why. I understand that what I am doing here is a little but hacky, but since the “real” if has 6000 else ifs and I cannot have that many files open it’s the only solution I could come up with.

There is something in the way this works I am not getting correctly, can anybody enlighten me?

Thanks,
Albert

Hi,

[quote]I find that each root file has more than one “Pi0-Tuple” with the form “Pi0-Tuple;XX”[/quote]Those are called ‘cycles’ (see User’s Guide for details) and are backup copies of the TTree meta. To avoid those you can call newtree->Write("",TObject::kOverwrite) or newtree->AutoSave().

To avoid having to open and close the TTree and TFile all the time (which are expansive operations), consider using something like this:

void fillTTree(const char *filename, TTree *original) { static TList files; TFile *input = (TFile*)files.FindObject( filename ); TTree *newtree; if (input == 0) { // Check if we have space. int alreadyOpened = files.GetEntries(); if (alreadyOpened > 500) { // Close one of the files TFile *toclose = (TFile*)files.First(); files.RemoveFirst(); toclose->Write("",kOverwrite); delete toclose; } input = TFile::Open(filename,"UPDATE"); input->GetObject("Pi0-Tuple", newtree); if (!newtree) newtree = tree->CloneTree(0); else { // Reconnect the TTree. original->AddClone( newtree ); original->CopyAddresses( newtree ); } } else { // Assumes we already connected the new tree. input->GetObject("PiO-Tuple", newtree); } newtree->Fill(); }

NOTE that your code is missing the lines: // Reconnect the TTree. original->AddClone( newtree ); original->CopyAddresses( newtree );without which the ‘reloaded’ TTree will NOT copy any actual data …

Cheers,
Philippe.

Hi Phillipe,

with a slightly modified version of your function:

void fillTTree(const char *filename, TTree *original) { static TList files; TFile *input = (TFile*)files.FindObject( filename ); TTree *newtree; if (input == 0) { // Check if we have space. int alreadyOpened = files.GetEntries(); if (alreadyOpened > 10) { // Close one of the files TFile *toclose = (TFile*) files.First(); files.RemoveFirst(); toclose->Write("",TObject::kOverwrite); delete toclose; } input = TFile::Open(filename,"UPDATE"); input->GetObject("Pi0-Tuple", newtree); if (!newtree) newtree = original->CloneTree(0); else { // Reconnect the TTree. original->AddClone( newtree ); original->CopyAddresses( newtree ); } } else { // Assumes we already connected the new tree. input->GetObject("PiO-Tuple", newtree); } newtree->Fill(); }

I am getting errors of Too many files open with > 100 and with the shown value of 10 I get the following error (numbers indicate entry in the loop):

0 1 Error in <TFile::ReadBuffer>: error reading all requested bytes from file 406_3689.root, got 222 of 300 Warning in <TFile::Init>: file 406_3689.root probably not closed, cannot read free segments Warning in <TFile::Init>: file 406_3689.root has no keys 2 Error in <TFile::ReadBuffer>: error reading all requested bytes from file 406_3689.root, got 222 of 300 Warning in <TFile::Init>: file 406_3689.root probably not closed, cannot read free segments Warning in <TFile::Init>: file 406_3689.root has no keys ... 110 Error in <TFile::ReadBuffer>: error reading all requested bytes from file 406_3689.root, got 222 of 300 Warning in <TFile::Init>: file 406_3689.root probably not closed, cannot read free segments Warning in <TFile::Init>: file 406_3689.root has no keys Error in <TFile::ReadBuffer>: error reading all requested bytes from file 3241_854.root, got 222 of 300 Warning in <TFile::Init>: file 3241_854.root probably not closed, cannot read free segments Warning in <TFile::Init>: file 3241_854.root has no keys 111 SysError in <TFile::TFile>: file 406_3689.root can not be opened (Too many open files)

I guess that puts my max files open in a little bit over 100, but I don’t know about the other errors. Besides, I think there wouldn’t be more than 10 files open. Am I leaving some file handlers unhandled?

Thanks,
Albert

Hi,

[quote] Am I leaving some file handlers unhandled? [/quote]Yes, my bad :slight_smile:. The code I gave is lacking the essential “files.Add(input);” after opening a new file (hence the TList was always empty):void fillTTree(const char *filename, TTree *original) { static TList files; TFile *input = (TFile*)files.FindObject( filename ); TTree *newtree; if (input == 0) { // Check if we have space. int alreadyOpened = files.GetEntries(); if (alreadyOpened > 10) { // Close one of the files TFile *toclose = (TFile*) files.First(); files.RemoveFirst(); toclose->Write("",TObject::kOverwrite); delete toclose; } input = TFile::Open(filename,"UPDATE"); list.Add( input ); input->GetObject("Pi0-Tuple", newtree); if (!newtree) newtree = original->CloneTree(0); else { // Reconnect the TTree. original->AddClone( newtree ); original->CopyAddresses( newtree ); } } else { // Assumes we already connected the new tree. input->GetObject("PiO-Tuple", newtree); } newtree->Fill(); }

Cheers,
Philippe

Hi,

it is almost working, but I get a bus error when input != 0 and tries to do

I have tried

and then works. It seems like there is something not accessed properly, but frankly I am completely lost in the way ROOT handles this kind of things… Is it safe if I leave it with my correction? Anyway, I’d like to understand why this fails…

Cheers,
Albert[/code]

Hi,

The only normal reason why the first one would fail while the 2nd seems to be succeed would be if the object in the file exist but does not inherit from TTree.

However I just noted that there is a difference (most likely introduced by my typo) between the name used in both case, the first one is the letter 0 while the 2nd one use the number 0; I suspect that with the correct name the first one would also work.

Cheers,
Philippe.

That must definitely be it, I’ll chek though. Anyway, I don’t know why they put letter O so near 0 in the keyboards, I mess up all the type :stuck_out_tongue:

Thanks a lot for all the kind help,
Albert

Hi Philippe,

I have been testing the script and if the file has not been closed nothing is written when the fillTTree function is closed. I do not understand why, maybe it is because the references are lost?

Cheers,
Albert

Hi,

In order for the meta data (i.e. the TTree object itself) to be written to the disk you need to make sure that myfile->Write(…) is called …

So you need to make that all the files that are not yet closed by the end of cutter and finally closed … you can simply do:TIter fileiter( gROOT->GetListOfFiles() ); TFile *file; while ( (file = (TFile*) fileiter() ) ) { file->Write("",kOverwrite(); }(do not delete the file as it will invalidate the iterator that you are looping over).

Cheers,
Philippe.

[quote=“pcanal”]Hi,

[quote] Am I leaving some file handlers unhandled? [/quote]Yes, my bad :slight_smile:. The code I gave is lacking the essential “files.Add(input);” after opening a new file (hence the TList was always empty):void fillTTree(const char *filename, TTree *original) { static TList files; TFile *input = (TFile*)files.FindObject( filename ); TTree *newtree; if (input == 0) { // Check if we have space. int alreadyOpened = files.GetEntries(); if (alreadyOpened > 10) { // Close one of the files TFile *toclose = (TFile*) files.First(); files.RemoveFirst(); toclose->Write("",TObject::kOverwrite); delete toclose; } input = TFile::Open(filename,"UPDATE"); list.Add( input ); input->GetObject("Pi0-Tuple", newtree); if (!newtree) newtree = original->CloneTree(0); else { // Reconnect the TTree. original->AddClone( newtree ); original->CopyAddresses( newtree ); } } else { // Assumes we already connected the new tree. input->GetObject("PiO-Tuple", newtree); } newtree->Fill(); }

Cheers,
Philippe[/quote]

Hi Philippe,

sorry for bringing up this old thread again, but there is something new that has developed with this code. I am trying to compile it as a standalone code, and now it fails because TTree->AddClone is protected. I have been looking at the ttree documentation and I don’t see any workaround. ANy hints?

Hi,

As long as you do not use TChain objects and do not change the branches addresses mid-stream, you ought to be okay without the call to AddClone.

Cheers,
Philippe.

Hi,

I’m afraid I am using TChains… :frowning:

But not for the code above? It does not seem to be able to handle them (since the code open the Files explicitly) …

Philippe.

Hi Philippe,

true, it was this code wrapped in a class which has to cut many input files.

I have attached the source.
CellCutter.cxx (3.4 KB)
CellCutter.cxx (3.4 KB)

Thanks,
Albert
CellCutter.h (1.46 KB)

Hi,

Ok so you do need AddClone, use the following:#define protected publicas a way to work around the privacy. We will need to add a new interface and/or make AddClone public in a future release.

Cheers,
Philippe.

That worked! Thanks a lot!

Cheers,
Albert