TEntryLists for Chain and each File in the Chain

Hi,

for our Analysis of data and Monte Carlo files we are trying to implement a way to to a particle identification (corresponding to different entries in the trees in the files) and saving the result into entry lists e.g. one for each particle type.

It is important for us to have in the end one list for each particle type and each file of the chain. It is likely that for the further parts of the analysis only a subset of files is chosen to be used, so the code would have to connect these files and load all lists from a file that are requested, so if we want electrons only the lists for those should be loaded and a global entrylist for the chain should be created.

The problem we are encountering for the following piece of code is that in the entry lists there are wrong numbers of particles. There should be 10000 entries for electron and pion each but it is 20000 each somehow. We were trying to build a global entry list for each particle first which should then be divided by TEntryList::GetLists(). For the Monte Carlo Files below we already know that all entries in a tree is a certain particle, so we just (try to…) add all of them to the entrylist:

[code]void CTB_AnalysisV3::CreateEntryLists() {
cout << “INFO: CreateEntryLists is creating your entry lists…” << endl;

TFile *EntryLists_file=new TFile(“EntryLists.root”,“recreate”);
if (!EntryLists_file) {
cerr << “ERROR: EntryLists.root not found!” << endl;
return;
}

TEntryList *EntryList_u=new TEntryList(“EntryList_u”,“EntryList_u”);
TEntryList *EntryList_p=new TEntryList(“EntryList_p”,“EntryList_p”);
TEntryList *EntryList_e=new TEntryList(“EntryList_e”,“EntryList_e”);
TEntryList *EntryList_m=new TEntryList(“EntryList_m”,“EntryList_m”);
char filename_old[200], filename_new[200];
int pid=0; // set initial PID to unknown

Long64_t N_events=fChain->GetEntries();
for (Long64_t Event=0; Event<N_events;Event++) {
fChain->SetBranchStatus("*",0);
fChain->GetEntry(Event);

TFile *file=fChain->GetFile();
sprintf(filename_new,"%s",file->GetName());
string filename_str=filename_new;

if (strcmp(filename_old,filename_new)!=0) { // check whether new file has been connected

cout << “New file connected: " << filename_new << endl;
sprintf(filename_old,”%s",filename_new);
string filename_str=filename_new;
string run_str;

  if (filename_str.find(".mc.root")!=string::npos) { // do MC PID
    string pid_str=filename_str.substr(filename_str.find("_10k")+4,1);
    TFile *currFile=new TFile(filename_str.c_str());
    TTree *tmpTree=(TTree*)currFile->Get("tree");
    Long64_t currEntries=tmpTree->GetEntries();

cout << "…it’s a MC file with " << currEntries << " entries of type " << pid_str << endl;
for(Long64_t entries=0; entries<currEntries; ++entries) {
if (pid_str==“p”) EntryList_p->Enter(entries,fChain);
else if (pid_str==“e”) EntryList_e->Enter(entries,fChain);
else if (pid_str==“m”) EntryList_m->Enter(entries,fChain);
else EntryList_u->Enter(entries,fChain);
/* if (pid_str==“p”) EntryList_p->Enter(Event+entries);
else if (pid_str==“e”) EntryList_e->Enter(Event+entries);
else if (pid_str==“m”) EntryList_m->Enter(Event+entries);
else EntryList_u->Enter(Event+entries);*/
}
Event+=currEntries-1;
continue;
}
run_str=filename_str.substr(filename_str.find(“2102”)+3,4);
cout << "…it’s run number " << run_str << endl;
cout << “…it’s a data file… I’ll go on and do tedious NonTrtPid work ;-)” << endl;
}

// put pid cuts

switch (pid) {
  case 1: EntryList_p->Enter(Event,fChain); break;
  case 2: EntryList_e->Enter(Event,fChain); break;
  case 3: EntryList_m->Enter(Event,fChain); break;
  default: EntryList_u->Enter(Event,fChain);
}

}

EntryLists_file->cd();
fChain->SetBranchStatus("*",1);
TString particle_labels[4]={“u”,“p”,“e”,“m”};
for (int p=0;p<4;p++) {
TString EventListName;
EventListName=“EntryList_”+particle_labels[p];
cout << EventListName << endl;
TEntryList EntryList=(TEntryList)EntryLists_file->Get(EventListName);
// if (!EntryList) cout << “EntryList” << endl;
TList *LEntryList=EntryList->GetLists();
// if (!LEntryList) cout << “LEntryList” << endl;
TIter next(LEntryList);
Int_t count=0;
while (TObject *obj=next()) {
TEntryList SubEntryList=(TEntryList)obj;
TString title=(TString)SubEntryList->GetFileName();
string filename_str=title.Data();
// string filename_str=(string)title;
cout << filename_str << endl;
if (filename_str.find(".mc.root")!=string::npos) { // do MC PID
title=“EntryList-”+title(title.Length()-11,6)+"-"+particle_labels[p];
} else {
title=“EntryListMC-”+title(title.Length()-11,6)+"-"+particle_labels[p];
}
SubEntryList->SetName(title.Data());
SubEntryList->Write();
cout << SubEntryList->GetN() << endl;
++count;
}
}
EntryLists_file->cd();
EntryList_u->Write();
EntryList_p->Write();
EntryList_e->Write();
EntryList_m->Write();
// EntryLists_file->Write();
EntryLists_file->Close();
cout << “INFO: CreateEntryLists is done!” << endl;
}
[/code]

And this code is the one that is supposed to take each file in a chain (at the moment the same that was used to create the lists) and load the lists for each file and build the global one.

[code]void CTB_AnalysisV3::GetEntryList(int pid_loop) {
TString particle_labels[4]={“u”,“p”,“e”,“m”};
cout << “INFO: GetEntryList is reading your entry list for " << particle_labels[pid_loop] << " …” << endl;
TFile *EntryLists_file=new TFile(“EntryLists.root”,“read”);
if (!EntryLists_file) {
cout << “ERROR: EntryLists.root not found!” << endl;
return;
}
TEntryList *EntryList=new TEntryList();
TObjArray *ListOfFiles=fChain->GetListOfFiles();
TIter next(ListOfFiles);
Int_t count=0;
while (TObject *obj=next()) {
TString title;
title=(TString)obj->GetTitle();
title=“EntryList-”+title(title.Length()-11,6)+"-"+particle_labels[pid_loop];
TEntryList SubEntryList=(TEntryList)EntryLists_file->Get(title.Data());
if (SubEntryList) {
EntryList->Add(SubEntryList);
}
++count;
}
if (!EntryList) {
cout << “ERROR: EntryList for " << particle_labels[pid_loop] << " not found!” << endl;
return;
}
fChain->SetEntryList(EntryList);

EntryLists_file->Close();
cout << “INFO: GetEntryList is done!” << endl;
}
[/code]

currently the latter code is not yet called because the creation of the lists already has an error (as discribed above).

Is there any obvious mistake in CreateEntryLists() or is the way of using the lists not correct? We are not sure about the way of converting global to local lists and the other way around. The Reference Guide seems to be not so detailed in that part. Thank you for having a look into it, I hope I pointed out everything important.

Cheers,

Daniel

Hi Daniel,

Do I understand correctly that you:

  1. Load the entry in the chain
  2. If this entry is from a new file, you open the file, open the tree, and, depending on the events in the tree, put them into an appropriate entry list. Then you update the current event number in the chain and want to jump directly to the next tree.
  3. But the switch(pid) statement later on, when is it valid?

In the loop on the tmpTree entries, you should call EntryList->Enter(entries, tmpTree), because entries corresponds to the local index inside a tree, not a global index inside a chain(that would be Event, right?). Or, even better, change it to something like:

if (pid_str==“p”) EntryList_p->SetTree(tmpTree);
//… same for other pid_str
for (entries=…)
if (pid_str==“p”) EntryList_p->Enter(entries);
//… same for other pid_str

Is it true, that in the end you get the entry lists for all the right files, but with twice the number of entries? It looks like you are somehow adding the entries twice, because if the global->local part of TEntryList has a bug, it would add the entries in the wrong place, but not more than once.

How big are your files? Can you make a small subsample, where you could print those entry lists that you get and see exactly what was added (TEntryList::Print(“all”))? And put those files somewhere where I could take them and give it a try?

Cheers,
Anna

Hello Anna and all…

I had another look into it. To be sure we don’t produce this error due to some other stuff in our code I programmed a macro that does what we want only in kind of a dummy version.

But first to answer your questions, Anna:

  • I think you are right concerning tmpTree and the Enter statement. It looks in the code as if we would like to fill an EntryList for the tree of the current file. If so we should probably do:
if (pid_str=="p") EntryList_p->Enter(entries,tmpTree); 
else if (pid_str=="e") EntryList_e->Enter(entries,tmpTree); 
else if (pid_str=="m") EntryList_m->Enter(entries,tmpTree); 
else EntryList_u->Enter(entries,tmpTree); 

…but that is not what want we want. During our several tries to debug the code we commented the (as we thought) right part (see directly below the part we are talking about). There you can see that we were doing:

 if (pid_str=="p") EntryList_p->Enter(Event+entries); 
 else if (pid_str=="e") EntryList_e->Enter(Event+entries); 
 else if (pid_str=="m") EntryList_m->Enter(Event+entries); 
 else EntryList_u->Enter(Event+entries);

corresponding to our aim to fill a global list for each particle and thus we were using as the entry Event+entries. I am now quite convinced that we should also set fChain as the right tree…I am sorry because this was really misleading!

  • concerning the switch(pid) statement: at the moment this statement is not really valid as in this part there will be called quite a lot of code for each event later to do the particle ID. for the moment we only tried MonteCarlo and as we know what the particle is (all are the same in a file) we just fill all events at once as you have seen.

As to the macro I programmed to further investigate the problem (the complete code is below the text). Essentially I did the same as in the above code with four files, two data and two Monte Carlo. I was using two methods to fill the EntryLists for just two particle types.

  1. using the TChain::Draw Command with a dummy cut on the no of tracks in the event (currently commented out in the code):
 //  ch.Draw(">>globalList_e","trk_nTracks==3","entrylist");
//   ch.Draw(">>globalList_pi","trk_nTracks==2","entrylist");
  
//   TEntryList *globalList_e =  (TEntryList*)gDirectory->Get("globalList_e");
//   TEntryList *globalList_pi = (TEntryList*)gDirectory->Get("globalList_pi"); 
  1. Using a loop over the chain and TEntryList::Enter to fill each event into the lists created at the beginning with just the same cut for the particles. The described procedure to fill all entries of a MC file at once is commented out so that the output can be compared directly with method 1:
 // first create the global entry lists for two particle types 
  
  TEntryList *globalList_e = new TEntryList("globalList_e","globalList_e");
  TEntryList *globalList_pi = new TEntryList("globalList_pi","globalList_pi");

  // then loop over the chain and check dummy condition for the two particle types
  // if TRUE fill them into appropriate entry list with the mysterious command TEntryList::Enter...
  
  
  TFile *newFile;
  TString newName;
  TString oldName;

  for(Int_t event=0;event<ch.GetEntries();++event){
    
    ch.GetEntry(event);
    newFile = ch.GetFile();
    newName = newFile.GetName();
    
    // Check if file has changed...
    if(newName!=oldName){
     
      cout << "Processing file: " << newName << "....." << endl;   
 
      // Check if file is MC...
      //if( newName.Contains("MC") ){
     
	//TTree* tmpTree = (TTree*)newFile.Get("TB/tree");
	//Int_t currEntries = tmpTree->GetEntries();
	
	// We know the particle type so we are filling all entries at once
	//for(Int_t i=event;i<event+currEntries;++i){
	 // if( newName.Contains("e--") ) globalList_e->Enter(i,&ch);
	  //else if( newName.Contains("pi--") ) globalList_pi->Enter(i,&ch);
	//}
	
	//event += currEntries-1; // correct event number
	//oldName = newName;
	//continue; // for MC leave here...
      //}
      
    }
    
    // Ok if it's not MC, so its data...check for dummy variable (later PID here)...
    else if(trk_nTracks == 3) globalList_e.Enter(event,&ch);
    else if(trk_nTracks == 2) globalList_pi.Enter(event,&ch);
    oldName = newName;
  }

Afterwards some printing is done and comparing the no of entries. The funny thing is: If I use TEntryList::Enter, the no of entries only in the first file are twice as high as with method no 1. All other files have the same results in both methods.

When doing a TEntryList::Subtract with the list of the second method (twice the no of entries) and the list of the first method 0 entries are retained (none to be seen when doing TEntryList::Print(“all”) ) so that the list apparently contains the right no of elements and exactly those that the other list has BUT the number that you get with TEntryList::GetN() seems to be wrong. As the other files except the first one give the right numbers I’ll assume that my code is not wrong and as the TChain::Draw Method is quite popular I think there is no bug here but perhaps in the Enter Method. Perhaps someone can reproduce this and investigate this in the ROOT Method…? To be able to do so, I will try to put some reduced data files into an afs scratch space with public access. If its ready, I’ll post the location here.

Sorry that this got a bit lengthy…

Cheers, Daniel

Here is the full code of the macro:

{
 gROOT->Reset();

  TChain ch("TB/tree");
  ch.AddFile("localntuples/ntuple-2102395-000-0-12.0.5.root");
  ch.AddFile("localntuples/ntuple-2102396-000-0-12.0.5.root");
  ch.AddFile("localntuples/ntuple-2102103-000-0-12.0.5-e--MC.root");
  ch.AddFile("localntuples/ntuple-2102103-000-0-12.0.5-pi--MC.root");

  Int_t trk_nTracks = -1;
  ch.SetBranchStatus("*",0);
  ch.SetBranchStatus("trk_nTracks",1);
  ch.SetBranchAddress("trk_nTracks",&trk_nTracks);
  
  TFile file("EntryLists.root","recreate");

  // The "old way" using the TChain::Draw command...
  
 //  ch.Draw(">>globalList_e","trk_nTracks==3","entrylist");
//   ch.Draw(">>globalList_pi","trk_nTracks==2","entrylist");
  
//   TEntryList *globalList_e =  (TEntryList*)gDirectory->Get("globalList_e");
//   TEntryList *globalList_pi = (TEntryList*)gDirectory->Get("globalList_pi"); 

  //And now the "new way" using TEntryList::Enter...
  
  // first create the global entry lists for two particle types 
  
  TEntryList *globalList_e = new TEntryList("globalList_e","globalList_e");
  TEntryList *globalList_pi = new TEntryList("globalList_pi","globalList_pi");

  // then loop over the chain and check dummy condition for the two particle types
  // if TRUE fill them into appropriate entry list with the mysterious command TEntryList::Enter...
  
  
  TFile *newFile;
  TString newName;
  TString oldName;

  for(Int_t event=0;event<ch.GetEntries();++event){
    
    ch.GetEntry(event);
    newFile = ch.GetFile();
    newName = newFile.GetName();
    
    // Check if file has changed...
    if(newName!=oldName){
     
      cout << "Processing file: " << newName << "....." << endl;   
 
      // Check if file is MC...
      //if( newName.Contains("MC") ){
     
	//TTree* tmpTree = (TTree*)newFile.Get("TB/tree");
	//Int_t currEntries = tmpTree->GetEntries();
	
	// We know the particle type so we are filling all entries at once
	//for(Int_t i=event;i<event+currEntries;++i){
	  //if( newName.Contains("e--") ) globalList_e->Enter(i,&ch);
	//  else if( newName.Contains("pi--") ) globalList_pi->Enter(i,&ch);
	//}
	
	//event += currEntries-1; // correct event number
	//oldName = newName;
	//continue; // for MC leave here...
      //}
      
    }
    
    // Ok if it's not MC, so its data...check for dummy variable (later PID here)...
    else if(trk_nTracks == 3) globalList_e.Enter(event,&ch);
    else if(trk_nTracks == 2) globalList_pi.Enter(event,&ch);
    oldName = newName;
  }
   
  globalList_e->Write("ForWholeChain_e");
  globalList_pi->Write("ForWholeChain_pi");
 
  TList *sublists_e = (TList*)globalList_e->GetLists();
  TList *sublists_pi = (TList*)globalList_pi->GetLists();
  if(!sublists_e) cerr << "WARNING: sublists_e could not be created from global list..." << endl; 
  if(!sublists_pi) cerr << "WARNING: sublists_pi could not be created from global list..." << endl; 
 
  TIter listIt_e(sublists_e);
  TIter listIt_pi(sublists_pi);
  TEntryList *next_e = 0;
  TEntryList *next_pi = 0;

  Int_t total_e = 0;
  Int_t total_pi = 0;

  while ( next_e = (TEntryList*)listIt_e.Next() ){
    TString longFileName = next_e.GetFileName();
    TString shortFileName = longFileName.Remove(0,69);
    TString eventListName = shortFileName(7,7);
    TString type;
    if(shortFileName.Contains("e--MC")) type = "MCe";
    else if (shortFileName.Contains("pi--MC")) type = "MCpi";
    else type = "";
    eventListName += type+"_e";
    next_e->Write(eventListName);
    printf("Wrote sublists_e for %s file %s as %s to EventLists.root\n",type.Data(),shortFileName.Data(),eventListName.Data() ); 
    printf("sublists_e %s has %d entries...\n",eventListName.Data(),next_e->GetN() );
    total_e += next_e.GetN();
  }
  printf("Checking electron lists...\n");
  printf("Total sum of all entries in sublists_e is: %d....global List has %d entries\n",total_e,globalList_e->GetN() );
  if(globalList_e->GetN()!=total_e) cerr << "WARNING: entry numbers in global list and sublists_e do not match!" << endl;
  else printf("Writing of sublists_e and check of entries successful!\n\n");

  while ( next_pi = (TEntryList*)listIt_pi.Next() ){
    TString longFileName = next_pi.GetFileName();
    TString shortFileName = longFileName.Remove(0,69);
    TString type;
    if(shortFileName.Contains("e--MC")) type = "MCe";
    else if (shortFileName.Contains("pi--MC")) type = "MCpi";
    else type = "";
    TString eventListName = shortFileName(7,7);
    eventListName += type+"_pi";
    next_pi->Write(eventListName);
    printf("Wrote sublists_pi for %s file %s as %s to EventLists.root\n",type.Data(),shortFileName.Data(),eventListName.Data() ); 
    printf("sublists_pi %s has %d entries...\n",eventListName.Data(),next_pi->GetN() );
    total_pi += next_pi.GetN();
  }
  printf("Checking pion lists...\n");
  printf("Total sum of all entries in sublists_pi is: %d....global List has %d entries\n",total_pi,globalList_pi->GetN() );
  if(globalList_pi->GetN()!=total_pi) cerr << "WARNING: entry numbers in global list and sublists_pi do not match!" << endl;
  else printf("Writing of sublists_pi and check of entries successful!\n");
  
  //file->Close();
}

Hello,

if someone is willing to test the macro with some testntuples the code needs, you can use the files testntuple*.root in http://www.ifh.de/~danri/
or in my public on afs: /afs/ifh.de/users/d/danri/public/www/

Cheers,

       Daniel

Hi Daniel,

There was indeed a bug in computing the total number of entries , when TEntryList::Enter() was used with a TChain argument. :blush: Thanks a lot for pointing it out! Could you please try with the cvs head and see if it does what you expect now?

Now, concerning global/local things and TChain/TTree argument in Enter(). TEntryList always stores local indices, i.e. indices inside a TTree. If you pass it a global entry number and a TChain, it will internally compute the local entry number from the global one and the TChain offsets table, and only store this local number. A TEntryList for a TChain is just a TList of TEntryLists for the TTrees of this chain. So, it doesn’t really matter, if you call Enter(entries, tmpTree) or Enter(Event+entries, fChain), internally you still store entries in a sublist, corresponding to tmpTree. The fastest way is to call TEntryList::SetTree() before the loop on the tree entries, and just call Enter(entries) later. This way you’ll avoid calling SetTree for each entry number.

Hope this helps.
Anna

Hi Anna,

thank you for fixing it. Indeed SetTree works better for us:-)

Cheers, Daniel