Is there a way to move a TTree to a already existing TChain ? (without storing the TTree to a root file)

Hello,

My goal is to make a skimming of a TChain and deal with it afterwards, without having to store in a root file the skimmed obtained result.

Apparently, the only way is to skim the TChain is to clone it into a TTree.

The problem is that I really need a TChain for my program, so I need to transform the TTree in a TChain.

And apparently, adding the TTree to the TChain is not working, so that the information is lost.

Here is a minimum program illustrating that.
Would there a be an idea ?

Thank you

int ProblemTChain()
{
  TChain *chain=new TChain("tree_sel_HH_Resonant_mX260");
  
  chain->Add("/sps/atlas/e/escalier/ATLAS_HGam/Outputs_trees_selection_HH_baseline/tree_h025_mc16d_PowhegH7_HHbbyy_cHHH01d0_nominal.root");
  
  cout << "phase 1, chain->GetEntries()=" << chain->GetEntries() << endl;
  //-----------------------------------------------------------------
  TTree *tree_restriction=chain->GetTree()->CloneTree(0);
  
  unsigned long long eventNumber;
  TBranch *b_eventNumber;
  
  chain->SetBranchAddress("eventNumber", &eventNumber, &b_eventNumber);
  
  chain->SetBranchStatus("*", 1);
  //  https://root-forum.cern.ch/t/skim-events-from-a-tchain-into-a-new-file/9353/2
  //next line
  chain->LoadTree(0);

  int nb_entries=chain->GetEntries();
  cout << "nb_entries=" << nb_entries << endl;
  
  for (int index_entry=0;index_entry<nb_entries;index_entry++) {

    if (index_entry%2==1) //example of a restriction
      continue;

    //    cout << "index_entry=" << index_entry << " / " << nb_entries << endl;
    Long64_t centry=chain->LoadTree(index_entry); //mandatory to move from index_entry to centry with LoadTree, due to presence of several chains, else problems
    if (centry<0)
      continue;
    
    chain->GetEntry(index_entry); //index_entry=absolute : mandatory : != branch
    tree_restriction->Fill();
  } //end loop on entries
  
  cout << "phase 2, tree_restriction->GetEntries()=" << tree_restriction->GetEntries() << endl;
  //-----------------------------------------------------------------
  chain->AddClone(tree_restriction);

  //remark : the trick to cast the TTree to a TChain, and then to do : chain->Add(TheCastedTreeIntoATChain) is crashing, because the TChain and TTree are not same format
  
  cout << "phase 3, chain->GetEntries() after having added the tree by chain->AddClone(tree_restriction)=" << chain->GetEntries() << endl;
  return 0;
--> this gives :
phase 1, chain->GetEntries()=358
nb_entries=358
phase 2, tree_restriction->GetEntries()=179
phase 3, chain->GetEntries() after having added the tree by chain->AddClone(tree_restriction)=358

-->the last number is 358, and not 358+179

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi @escalier,
please check this thread about posting code snippets – I edited your post to make things more readable.

If I understand correctly, you successfully create a TTree by selecting certain events from the original TChain. Then you want to construct a TChain that contains both the original TTree and the skimmed TTree, for further processing, but you don’t want to write the skimmed TTree to a file.

@pcanal can correct me if I’m wrong, but I don’t think TChain is able to deal with in-memory TTrees. However, you might get away with writing the skimmed TTree to a TMemFile, i.e. a memory-resident TFile. Would that help?

Cheers,
Enrico

Hi @eguiraud
*Thank you for the information on the code style.
*Your description on what I wish corresponds exactly to your summary. Let’s see if @pcanal confirm that TChain is not able to manage in-memory TTree.

If this is not feasible, then I may do the treatment in a different way, by doing the skimming at the RooDataSet level, by a “cut”, using the total unskimmed TChain.

*I have tried the TMemFile, but it seems to fail :

  cout << "create mem file" << endl;
  TMemFile memfile("test_memfile.root");
  cout << "created mem file" << endl;
  tree_restriction->Write();
  memfile.Close();

  cout << "try to add from the mem file" << endl;
  chain->Add("test_memfile.root");
  cout << "phase 2, after adding test memfile: chain->GetEntries()=" << chain->GetEntries() << endl;

This gives :

 create mem file
Error in <TMemFile::Init>: test_memfile.root not a ROOT file
created mem file
Error in <TROOT::WriteTObject>: The current directory (Rint) is not associated with a file. The object (tree_sel_HH_Resonant_mX260) has not been written.
try to add from the mem file
Error in <TFile::TFile>: file test_memfile.root does not exist

Just like with a normal TFile you need TMemFile("name", "recreate") to open it in write mode. This will get rid of the error in TMemFile::init.

Thank you @eguiraud

Doing your suggestion helps, but I still have an error message :

try to add from the mem file
Error in <TFile::TFile>: file test_memfile.root does not exist
phase 2, after adding test memfile: chain->GetEntries()=358

Would you have an idea ?

  cout << "create mem file" << endl;
  TMemFile memfile("test_memfile.root","recreate");
  cout << "created mem file" << endl;
  tree_restriction->Write();

  cout << "try to add from the mem file" << endl;
  chain->Add("test_memfile.root","read");
  cout << "phase 2, after adding test memfile: chain->GetEntries()=" << chain->GetEntries() << endl;

Uhm indeed TChain doesn’t seem to understand that there is a TMemFile with that name.
I am out of ideas I’m afraid.

All right. Thanks for the help and advices.

Why does it really need a TChain? (Apart from the list of files there is no interface in TChain that is not in TTree, so most often code can be written in term of the TTree interface that works for both TChain and TTree.)

Hi. The reason is that there are typically 100 root files to consider. So the natural way is to use a TChain. Then, from this file, a skimming may be needed for some particular option. But to skim a TChain, one need to move to a TTree, with the example that I gave, so one needs to be able to add the skimmed chain to the other chain, so I need to add the tree to the TChain, but apparently, this is not feasible

Do I understand correctly that you need to have a TChain with a mix of (original) files and (derived/skimmed) TTree?

No, I don’t need to have a mix of the original file and a derived tree : it was just for my minimal example in order to illustrate the problem.

I need to create a subset of the TChain, but there are several TChain : one TChain per process. Let’s imagine that we have two processes : process_1, made of 100 root files, process_2, made of 100 root file.

Then my program is made in a modular way.
One function return the TChain :
TChain *ReturnTChain(string TheStringOfTheProcess)

so I will proceed by allocating a global TChain that is empty
TChain *total_chain=new TChain("…")

total_chain->Add(ReturnTChain(“sample_of_type1”);
total_chain->Add(ReturnTChain(“sample_of_type2”);

And so, I try to the make the skimming the TChain inside the function ReturnTChain.

In this function ReturnTChain, I do
TChain *mychain=new TChain("…")
if (TheStringOfTheProcess==“something”)
mychain->Add("plenty_of_files)
else if (TheStringOfTheProcess==“something else”)
mychain->Add("plenty_of_other_files)

then, I clone the mychain in a TTree in order to do the skimming :
I fill the tree with the subset of events according to a criteria of selection.
And then… I could not more transform this TTree in a TChain, so I could not more return a TChain… to the caller of the function…

Given

No, I don’t need to have a mix of the original file and a derived tree :
and

total_chain->Add(ReturnTChain(“sample_of_type1”);
total_chain->Add(ReturnTChain(“sample_of_type2”);

Maybe I should reformulate my question: is the following situation possible/likely/intended?:

total_chain->Add(ReturnTChain(“sample_of_type1”)); // Return a TChain pointing to physical files  
total_chain->Add(ReturnTChain(“sample_of_type2”)); // Would like to return a TChain pointing to a skimmed TTree in memory

where the intended is for the total_chain to contains the result of both those ReturnTChain calls at the same time?

Hello.
The objective is to obtain a restriction (a skimming) of both sample_of_type1 and sample_of_type1.

So ReturnTChain()
should return a restriction to a subpart of the events of the root files.
That is : in ReturnTChain, the first step is to get the full list of root files.
And from this TChain, I try to restrict to a subset of events in the root files. To do that, it seems that I have no other choice that to clone the TChain in a TTree, then to fill the TTree with those events that I’m interested in. But then, I could not return this TTree because it is not a TChain.

I think that I understand your suggestion : to keep the TChain everywhere in the fuction, and to do the skimming at the end :
total_chain->Add(ReturnTChain(“sample_of_type1”);
total_chain->Add(ReturnTChain(“sample_of_type2”);
Transform the TChain in a tree and skim it, and then work with the TTree for the rest of the program. But then, I would need to transform the full rest of the program to be able to deal with TTree instead of TChain, and also, during the intermediate steps, I get the full chain, while I only need a subset of the events, so there is a loss of managing more memory than what I need.

One more question … Are you guaranteed that the skims fits in memory?

To do that, it seems that I have no other choice that to clone the TChain in a TTree

Another alternative is to create a TEntryList …

But then, I would need to transform the full rest of the program to be able to deal with TTree instead of TChain,

For any of the parts of the code that does not create a TChain (new TChain) or deal explicitly with the files (Add), you should be able to just replace the TChain symbol with TTree and it should work as before.

There is indeed no way (with the current code) for a TChain to have a TMemFile as one of its component. The obviously alternative is to store the TTree into a file (uncompressed) where the file could be on a physical disk or a ram disk.

I’m not guaranteed that the skim fits the memory, but I would think that if the TChain fits memory, then by construction, the skim (which is a subpart) should fits the memory.

I’m investigating currently your extremely useful comment of the TEntryList (that I was not aware to exist)

I’m not sure about TEntryList : I made a test

*creating the entry list by hand with TList::Entry(index_entry) seems not work because afterwards, when applied to chain(myentrylist), it complains of :
Error in TChain::SetEntryList: No list found for the trees in this chain

*So I tried with the Draw trick to create a TEntryList (even if this approach is not fine for my case : I don’t wish to make a draw just in order to create a list, and the Draw is not flexible enough as compared to a selection with C++ command)

With the Draw, I could create a list of events, apply it to the TChain, but still, the TChain has the same number of entries as the beginning, and same for the RooDataSet (which is the object that I will use in the final step, in order to do a fit with roofit).

So somehow, I was not aware to restrict the TChain->RooDataSet to a subpart, since the RooDataSet still has the same number as the historical TTree.

example :

int ProblemTChain()
{
  TChain *chain=new TChain("tree_sel_HH_Resonant_mX260");
  
  chain->Add("/sps/atlas/e/escalier/ATLAS_HGam/Outputs_trees_selection_HH_baseline/tree_h025_mc16a_PowhegH7_HHbbyy_cHHH01d0_nominal.root");
  chain->Add("/sps/atlas/e/escalier/ATLAS_HGam/Outputs_trees_selection_HH_baseline/tree_h025_mc16d_PowhegH7_HHbbyy_cHHH01d0_nominal.root");
  
  cout << "phase 1, chain->GetEntries()=" << chain->GetEntries() << endl;
  //-----------------------------------------------------------------
  chain->SetBranchStatus("*", 1);
  chain->LoadTree(0);

  int nb_entries=chain->GetEntries();
  cout << "nb_entries=" << nb_entries << endl;

  TEntryList *restricted_list=new TEntryList();
  
  for (int index_entry=0;index_entry<nb_entries;index_entry++) {

    if (index_entry%2==1) //example of a restriction
      continue;

    //    cout << "index_entry=" << index_entry << " / " << nb_entries << endl;
    Long64_t centry=chain->LoadTree(index_entry); //mandatory to move from index_entry to centry with LoadTree, due to presence of several chains, else problems
    if (centry<0)
      continue;
    
    chain->GetEntry(index_entry); //index_entry=absolute : mandatory : != branch
    
    restricted_list->Enter(index_entry);
  } //end loop on entries
  
  //  chain->SetEntryList(restricted_list); //This complains : Error in <TChain::SetEntryList>: No list found for the trees in this chain
  //but this is reallyl this appraoch that I would need. The draw approach is not very useful.
 //next line is reducing a bit the number of events to 22 events instead of 666
  chain->Draw("m_yy>>elist","m_yy>128","entrylist"); //this is not very useful to make the restriction with a Draw...
  
  TEntryList *elist = (TEntryList*)gDirectory->Get("elist");

  chain->SetEntryList(elist); //this works, but afterwards, the TChain has still all entrie, and same for the RooDataSet, so for a fit with roofit, not sure that it would work

  elist->Print();
  elist->Print("v");

  cout << "chain->GetEntries()=" << chain->GetEntries() << endl;
  
  RooRealVar roorealvar_myy("m_yy","m_yy",105,160);
  RooArgSet argset(roorealvar_myy);
  RooDataSet *dataset=new RooDataSet("dataset","dataset",chain,argset);
  cout << "dataset->numEntries()=" << dataset->numEntries() << endl;

  return 0;
}

log :
phase 1, chain->GetEntries()=666
nb_entries=666
Info in TCanvas::MakeDefCanvas: created default TCanvas with name c1
TH1.Print Name = elist, Entries= 22, Total sum= 22
TH1.Print Name = elist, Entries= 22, Total sum= 22
chain->GetEntries()=666
dataset->numEntries()=666

@StephanH Can RooDataSet use/follow a TEntryList?

Note that the TEntryList does not change the number of entries in the chain :).

To see the EntryList in action see the behavior of TChain::Draw. For example the return value of:

ch.Draw("1","");

After

TEntryList *restricted_list=new TEntryList();

Instead of

    restricted_list->Enter(index_entry);

you meant:

    restricted_list->Enter(index_entry, chain);

Cheers,
Philippe.

Thank you Philippe.
ok, I see.
But if this restricted list allows only to change the behaviour of Draw and Project, there is unfortunately no interest for the goal of my program, which consists to use the TChain for a RooDataSet, in order to make fits with roofit using a RooDataSet.

if the TChain is not restricted by the entry list, we come back to the beginning.
The only solution that I find is doing a cut at the RooDataSet level, which means that I used for all the intermediate steps (the TChain, many events that I don’t need).

Thank you

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.