ROOT Version: 6-12-04
Platform: Ubuntu 18-04 (with virtual machine installed on Windows 10)
Compiler: Not Provided
Hi everybody !
I’m working on a virtual machine (Virtual Box) in which is installed Ubuntu and root 6-12-04. In order to sum up the work I want to realize, here are my different steps:
- Open a root file to extract the big TChain inside (same branches for all TTrees) ;
- Disable all branches / enable interesting branches, named : “Energy” , “Timestamp” , “Channel” ;
- Select interesting entries, with a TCut on Energy ;
- Realize a sortage on Timestamp (with BuildIndex() ? ) ;
- Data processing
My script, with some comments :
void Energy_Thresholds()
{
cout<<endl<<"-----------------------------------------------------------------------------" << endl;
cout<<endl<<"----------------------- Energy Thresholds ------------------------" << endl;
cout<<"-----------------------------------------------------------------------------"<<endl;
clock_t time_1 = clock();
// Open the root file with the TChain "run_6.root"
TFile* original_File = TFile::Open("run_6.root");
TTree* original_Tree = (TTree*)original_File->Get("Filtered");
cout << "number of entries : " << original_Tree->GetEntries() << endl; // ~ 170m entries
original_Tree->SetBranchStatus("*" , 0); // disable all branches
original_Tree->SetBranchStatus("Channel" , 1); // I have to keep this information for later
original_Tree->SetBranchStatus("Energy" , 1); // enabled for the energy thresholds
original_Tree->SetBranchStatus("Timestamp" , 1); // enabled for the sortage, and the draw of an histogram
// new file, "run_6_1.root" , in order to copy the Tree inside, with some conditions on Energy
TFile* output_File = TFile::Open("run_6_1.root" , "RECREATE");
TTree* output_Tree = original_Tree->CopyTree("Energy>80 && Energy<140"); // between 80 and 140 -> ~55m entries
// CopyTree juste create a line in the TFile as following :
// OBJ: TTree Filtered TTree with filtered sorted events : 0 at: 0x5654b33e3360 -> in RAM?
output_File->Write(); // The new Tree will be written , and will stay in the output_File when closed. -> In Disk
clock_t time_2 = clock();
cout << "process time for thresholds: " << ((float)(time_2-time_1) / CLOCKS_PER_SEC) << "seconds" << endl;
// This function seems to be a little bit faster than function with TEventList method , or a Fill method with "if" statement inside a "for" loop
delete original_File; // or original_File->Close() and output_File->Close() ?
delete output_File;
}
TTreeIndex* Sort_TS() // It can also be a void function. The TTreeIndex will be written, and can be read after
{
cout<<endl<<"-----------------------------------------------------------------------------" << endl;
cout<<endl<<"------------------------ Sortage by increasing Timestamps ------------------------" << endl;
cout<<"-----------------------------------------------------------------------------"<<endl;
clock_t time_1 = clock();
TFile* original_File = TFile::Open("run_6_1.root" , "UPDATE");
TTree* original_Tree = (TTree*)original_File->Get("Filtered");
// HERE is my main problem ... I try to create a BuildIndex on Timestamp ...
original_Tree->BuildIndex("Timestamp" , "Channel"); // A sort on Timestamp, but I will keep Channels in the TTreeIndex object
original_Tree->Write(); // Save the TTree in current File, with its new index
TTreeIndex* ind = (TTreeIndex*)original_Tree->GetTreeIndex(); // Check if the new Index is built, and if we can get it
cout << ind->GetN() << " entries in the TTreeIndex" << endl;
ind->Print("10"); // a little test to check if we have what we want
clock_t time_2 = clock();
cout << "process time : " << ((float)(time_2 - time_1 ) /CLOCKS_PER_SEC) << " seconds" <<endl;
// Here, I'm lost ... I can't "return ind" after the "delete original_File" because the TTreeIndex* pointer doesn't exist anymore ...
// It's pretty bad, right ? I'm looking for an other way to do this (void function for example)
return ind ; // I want to return it, in order to call it as a parameter in the following process function. So, it stays in RAM all the time ?
delete original_File;
}
void Process(TTreeIndex* in_ts)
{
// my process with TTreeIndex
}
int main () {
Energy_Thresholds();
TTreeIndex* ind = Sort_TS();
Process(ind);
return 0;
}
Sometimes, the TChain have 170 millions entries, and the Energy TCut gives more than 55 millions entries remaining. For these cases (unfortunately I didn’t find a way to get the size of it …) , the call of BuildIndex leads to an exception : std::bad_alloc .
More precisely : " Error in TRint::HandleTermInput(): std::bad_alloc caught: std::bad_alloc "
Am I really bad with my pointers management ? Is it possible that my allocated RAM for the virtual machine (~3000 Mo) is not big enough ? Is there a way to make a kind of chunk sizing ?
Two examples in the prompt, one working (TCut E>135 and E<140) , and not the other … (TCut E>80 and E<140) :
Energy Thresholds
number of entries : 170253997
process time for thresholds: 68.9602seconds
Sortage by increasing Timestamps
2053011 entries in the TTreeIndex
serial : Timestamp : Channel
0 : 578174060 : 6
1 : 659930369 : 3
2 : 1023391193 : 0
3 : 1602030416 : 7
4 : 1677231365 : 7
5 : 1768918345 : 2
6 : 2461758888 : 7
7 : 2552985037 : 5
8 : 2971680275 : 7
9 : 3318579685 : 0
process time : 3.46273 seconds
Processing
Info in TCanvas::MakeDefCanvas: created default TCanvas with name c1
(int) 0
Energy Thresholds
number of entries : 170253997
process time for thresholds: 146.94seconds
Sortage by increasing Timestamps
Error in TRint::HandleTermInput(): std::bad_alloc caught: std::bad_alloc
I’ve already found topics about similar problems on this forum :
“Create a new index for a TTree”
“BuildIndex aborts for a large TTree”
but it didn’t help me a lot …
Somebody have an idea to help me go further ?
Thanks in advance for your help!
Erik