Help with a general routine for tree filtration

Hi,

I’m trying to upgrade a routine I wrote to do general tree filtration and dst->microdst. Essentially the code takes a chain, applies a selection (e.g. as in Draw()) and only writes out specified branches. I can make this work if I do it in two passes:

(1) Apply selection on the input chain via TChain::Draw() into an event list.
(2) Set the event list on the original tree, turn all branches off on the input chain, go back and turn on only the ones we want to save.
(3) Clone the original chain via CloneTree(0)
(4) loop over the event list, GetEntry() in the original chain, Fill() output tree.

This works like a charm but I’d like to do it in one pass. Here is my attempt:

TTree* filter_tree(TTree* t, const char* selection, const char* vars, TFile* fout)
{

  // ttf implements the event selection
  TTreeFormula ttf("filter_tree",selection,t);

  // turn off all branches in input
  t->SetBranchStatus("*",0);

  // tokenize array of variable names
  TString s(vars);
  TObjArray* vlist = s.Tokenize(", ");
  // loop over variable names, turning on associated branch in input
  std::cout<<"filter_tree: writing out the following variables:"<<std::endl;
  for(int i=0; i<vlist->GetEntries(); i++){
    TObjString* obj = dynamic_cast<TObjString*>(vlist->At(i));
    TString v=obj->GetString();
    std::cout<<v<<std::endl;
    t->SetBranchStatus(v.Data(),1);
  }
  std::cout<<"end of variables"<<std::endl;

  fout->cd();
  TTree* tc = t->CloneTree(0);

  // go through and turn on branches needed by selection  
  std::cout<<"Turning on branches"<<std::endl;
  for(Int_t j=0; j< ttf.GetNcodes(); j++){
    std::cout<<"Enabling branch : "<<ttf.GetLeaf(j)->GetBranch()->GetName()<<std::endl;
    t->SetBranchStatus(ttf.GetLeaf(j)->GetBranch()->GetName(),1);
  }
  ttf.UpdateFormulaLeaves();

  Long64_t nbr=0;
  std::cout.precision(3);
  Long64_t i=0;
  Long64_t npass=0;
  bool keep_going=true;
  Int_t current_tree=-1;

  while(true) {
    if(t->GetTreeNumber()!=current_tree) {
      current_tree=t->GetTreeNumber(); 
      ttf.UpdateFormulaLeaves();
    }
    // load branches needed by selection
    for(Int_t j=0; j< ttf.GetNcodes(); j++){
      Long64_t nb = ttf.GetLeaf(j)->GetBranch()->GetEntry(i);
      if(nb<=0){
	// done reading
	std::cout<<"Should be done (1)? : [i,j] = "<<i<<" , "<<j<<std::endl;
	keep_going=false;
	break;
      }
      nbr+=nb;
    }
    if(!keep_going) break;
    // evaluate the selection, if true, fill output
    Double_t val = ttf.EvalInstance();
    if(val > 0.0){
      npass++;
      Long64_t nb=t->GetEntry(i,0);
      if(nb<=0){
	// done reading
	std::cout<<"Should be done (2)?"<<std::endl;
	keep_going=false;
	continue;
      }
      nbr+=nb;      
      tc->Fill();
    }
    // how's it going
    if(i%100000==0){
      std::cout<<i<<" entries  read : "<<npass<<" filled : "
      <<nbr/(1024.0*1024.0)<<" Mb read"<<std::endl;
    }
    
    i++; // don't forget
  }
  std::cout<<"Done                                              "<<std::endl;
  std::cout<<"filter_tree: output has "
	   <<npass<<" entries."<<std::endl;  
  return tc;
}

Now, the output is:

root [0]filter_tree("pan","pan_n13011001_0000*.root","is_pitt_fid==1","reco_enu,pass","blah.root");

filter_tree: building chain of 'pan' ntuples from pan_n13011001_0000*.root
filter_tree: using 1 trees.
Warning in <TClass::TClass>: no dictionary for class NuParent is available
filter_tree: writing out the following variables:
reco_enu
pass
end of variables
Turning on branches
Enabling branch : is_pitt_fid
0 entries  read : 0 filled : 3.81e-06 Mb read
Should be done (1)? : [i,j] = 3349 , 0
Done
filter_tree: output has 0 entries.

The TTreeFormula never returns a true value but doesn’t issue any errors and I know that there are events that pass the cut. The number 3349 is the number of events in my input file. Can anyone advise me on what I’m doing wrong? I’ve attached the code.

mike kordosky
filter_tree.C (3.1 KB)

Hi,

You do not need to manage the loading of the branches used by TTreeFormula (except for insuring that they are enabled). TTreeFormula will load only what is needs. However you HAVE TO (and you currently do not seem to) set the entry to be read in the TTree (use LoadTree).

So the following (untested) version of your code might work:

[code]TTree* filter_tree(TTree* t, const char* selection, const char* vars, TFile* fout)
{

// ttf implements the event selection
TTreeFormula ttf(“filter_tree”,selection,t);

// turn off all branches in input
t->SetBranchStatus("*",0);

// tokenize array of variable names
TString s(vars);
TObjArray* vlist = s.Tokenize(", ");
// loop over variable names, turning on associated branch in input
std::cout<<“filter_tree: writing out the following variables:”<<std::endl;
for(int i=0; iGetEntries(); i++){
TObjString* obj = dynamic_cast<TObjString*>(vlist->At(i));
TString v=obj->GetString();
std::cout<<v<<std::endl;
t->SetBranchStatus(v.Data(),1);
}
std::cout<<“end of variables”<<std::endl;

fout->cd();
TTree* tc = t->CloneTree(0);

// go through and turn on branches needed by selection
std::cout<<“Turning on branches”<<std::endl;
t->SetBranchStatus("*",1);
ttf.UpdateFormulaLeaves();

Long64_t nbr=0;
std::cout.precision(3);
Long64_t i=0;
Long64_t npass=0;
bool keep_going=true;
Int_t current_tree=-1;

while(true) {
t->LoadTree(i);
if(t->GetTreeNumber()!=current_tree) {
current_tree=t->GetTreeNumber();
ttf.UpdateFormulaLeaves();
}
// load branches needed by selection

// evaluate the selection, if true, fill output
Double_t val = ttf.EvalInstance();
if(val > 0.0){

[/code]

Cheers,
Philippe

Hi Phillipe,

[quote=“pcanal”]Hi,

You do not need to manage the loading of the branches used by TTreeFormula (except for insuring that they are enabled). TTreeFormula will load only what is needs. However you HAVE TO (and you currently do not seem to) set the entry to be read in the TTree (use LoadTree).

So the following (untested) version of your code might work:
.
.
.
[/quote]

You’ve managed to get me on the right track and now the program works with some modifications! I found that:

(1) I couldn’t omit the explicit loop over formula leaves. If I do not have the loop, the selection fails to work. This is true even if I have all branches on. I would like to understand this a little bit better because I have another bit of code (a TSelector derived class) that also tries to use TTreeFormula in a similar way and I found I had to explicitly loop over leaves in that case as well.

(2) I had to save the number returned by LoadTree. That number (“local_entry” below) is used in the call to GetEntry() on the formula branches. If I didn’t do that I would get a premature exit following from the condition “if(nb<=0)”, probably when reading an entry number greater than that held by the current tree.

Anyway, the working code looks like this:

  while(true) {
    Long64_t local_entry=t->LoadTree(i);
    if(t->GetTreeNumber()!=current_tree) {
      current_tree=t->GetTreeNumber(); 
      ttf.UpdateFormulaLeaves();
    }
    // load branches needed by selection
    
    for(Int_t j=0; j< ttf.GetNcodes(); j++){
      Long64_t nb = ttf.GetLeaf(j)->GetBranch()->GetEntry(local_entry);
      if(nb<=0){
	// done reading
	std::cout<<"Should be done (1)? : [i,j] = "<<i<<" , "<<j<<std::endl;
	keep_going=false;
	break;
      }
      nbr+=nb;
    }
    if(!keep_going) break;

Many thanks!

mike

quote I couldn’t omit the explicit loop over formula leaves. If I do not have the loop, the selection fails to work. This is true even if I have all branches on. I would like to understand this a little bit better because I have another bit of code (a TSelector derived class) that also tries to use TTreeFormula in a similar way and I found I had to explicitly loop over leaves in that case as well.
[/quote]This sounds like a problem in TTreeFormula :frowning:. This would be depend on your expression and TTree.

The local_entry is indeed required when calling GetEntry on a branch since the branch object are only local to a TTree (aka you correctly understood the problem).

Cheers,
Philippe

Hi Phillipe,

[quote=“pcanal”]This sounds like a problem in TTreeFormula :frowning:. This would be depend on your expression and TTree.
[/quote]

I don’t have much to add other than:

(1) I’ve only been doing selections on branches with simple types (floats, ints, fixed length arrays of either). My ntuple is almost entirely simple types except for one object which is split to one level (e.g. holds no arrays or other objects).

(2) I use root v5.06.00.

(3) My macro, as fixed, ought to work on the tree made by the “event” tutorial. So I think the TTreeFormula issue should be testable without lots of pain.

mike

Can you send me your current version of the file after you assert that using Event.root, it does indeed NOT work without the explicit call to GetEntry (and hence send me all the selection that you picked to use :slight_smile: ).

Thanks,
Philippe.

[quote=“pcanal”]Can you send me your current version of the file after you assert that using Event.root, it does indeed NOT work without the explicit call to GetEntry (and hence send me all the selection that you picked to use :slight_smile: ).

Thanks,
Philippe.[/quote]

Hi Philipe,

Thanks for your help. I’ve partially figured this problem out. I think it’s really a case in which my event loop isn’t terminating when the last event is doesn’t pass my selection:

    Double_t val = ttf.EvalInstance();
    if(val > 0.0){

      Long64_t nb=t->GetEntry(i,0);
      if(nb<=0){
	// done reading
	std::cout<<"Should be done (2)?"<<std::endl;
	keep_going=false;
	break;
      }
      else {
	npass++;
	nbr+=nb;      
	tc->Fill();
      }
    }
    if(!keep_going) break;

The “fix” I presented above worked around that problem by trying to read the branches of “one past the end” of the tree. That would fail and the loop would end. This code is embedded in a while(keep_going) loop as I do not want to read the headers of (what could be) several 1000 trees in order to write a for loop. My original code did that but I found TChain::GetEntries() was quite slow. So, I am searching for a way to ask the TChain if I’ve processed all its entries.

Any ideas?

P.s.- When this bit of code fully works I wonder if it might serve as a nice “advanced” root tutorial. I wish I would’ve realized how to do this sort of filtering years ago.

[quote]So, I am searching for a way to ask the TChain if I’ve processed all its entries. [/quote]Simply:

if (mychain->LoadTree(i) < 0 ) { // We have exhausted all entries in the chain. keep_going = false; break; }Cheers,
Philippe