I have a TTree object with 15mln+ entries. I have a script which loops through consecutive entries in the tree, and does stuff. To do this, I do something like this:
Trouble is, after about 1mln-th entry things REALLY slow down. I am guessing that during every call GetEntry starts at position zero, searches to position i in the file…which sure enough becomes slower with increasing i.
So my question is – is there a faster equivalent to this? I would like something along these lines (in pseudocode):
int i=0;
tree->GetEntry(0);
while(i<tree->GetEntries()){
tree->MoveToNextEntryFAST(); //supposedly a very quick operation
DoStuff();
++i;
}
The time to access to a Tree entry is independent of the entry number.
You are probably hitting a memory leak problem. We need more info from your side.
In my case I have a simple tree with integers (arrays in fact), file size is 1.7 GB, 15000000 entries. It’s two times faster to read the first 10^6 entries, than the last 10^6. I’m not sure, how this is implemented in TTree, probably, this is due to the large file size.
[quote=“aregjan”]My .root file is ~2.2GB…however in my case the difference between the first and last 1mln events
is dramatic – see above.[/quote]
I’ve added more branches, and now file size is >= 3GB. It’s a bit slower now, but difference between the first 10^6 and the last even decreased - it’s ~30 % now.
Can you show the minimal code, reproducing your timings? And give your machine specs.
Hmmm, very nice and useless “dump”, I can produce something like this by hands:
Does it prove anything?
If your code is a top secret, try these two macros, if you still see the same difference
(fix them, if needed, I just write them here, without executing):
//tree_fill.C
void tree_fill()
{
TFile f("tree.root", "recreate");
TTree * t = new TTree("aaa", "aaa");
int arr[1000] = {};
t->Branch("arr", arr, "arr[1000]/I");
for(int i = 0; i < 15000000; ++i)
t->Fill();
t->Write();
}
//tree_read.C
void tree_read()
{
TFile f("tree.root");
TTree * t = (TTree*)f.Get("aaa");
if(!t)
{
std::cout<<"FUUUUUUUUUUUUUU!\n";
return;
}
int arr[1000] = {};
t->SetBranchAddress("arr", arr);
TStopwatch timer;
timer.Start();
// for(int i = 13999999; i < 15000000; ++i)
for(int i = 0; i < 1000000; ++i)
t->GetEntry(i);
timer.Stop();
std::cout<<"Time is: "<<int(timer.RealTime())<<std::endl;
}
[quote=“aregjan”]The output shows that the first 1mln entries were read in 57sec, and the last
1mln were read in 10min. How’s this not obvious?[/quote]
And my “output” shows, that both work 10 s, is it obvious?
Without any code reproducing the problem, it’s useless to discuss what and how you measure.
Did you try my macros?
What do you mean no code?? The root commands that I used are up there – replace my run646647.root with your lovely .root file, cut-and-paste the commands, and see what you get.
When I told you about my file sizes and time difference today, this was my lovely root file and my brilliant macros I gave you. So, the problem you are talking about, exists only with your code and not reproducible with simple macros.
There is no “my code”. There are simple two CINT commands. Which show that it takes CINT 10x longer to process last 1mln entries than the first 1mln entries.
If you want to contribute something to this – rather than get hang up in ad hominem exchanges – then I would suggest that you run those very same commands (by cutting and pasting) on your own .root file, and post the output here.
[quote=“tpochep”][quote=“aregjan”]There is no “my code”. There are simple two CINT commands. Which show that it takes CINT 10x longer to process last 1mln entries than the first 1mln entries.
If you want to contribute something to this – rather than get hang up in ad hominem exchanges – then I would suggest that you run those very same commands (by cutting and pasting) on your own .root file, and post the output here.[/quote][/quote]
Listen, I do not have your file, I do not have your tree structure. And in your “code”, which I should copy paste, you even did not set branch address - are you familiar with TTree’s internals and know exactly, what happens?
That’s my output with your “two CINT commands” (I repeated them several times):
root [0] TFile f("tree.root")
root [1] TTree * t = (TTree *)f.Get("a")
root [2] system("date");for(int i=13999999;i<15000000;i++) { t->GetEntry(i);} system("date")
Wed Jan 5 22:03:04 CET 2011
Wed Jan 5 22:03:18 CET 2011
(const int)0
root [3] system("date");for(int i=13999999;i<15000000;i++) { t->GetEntry(i);} system("date")
Wed Jan 5 22:03:24 CET 2011
Wed Jan 5 22:03:39 CET 2011
(const int)0
root [4] system("date");for(int i=13999999;i<15000000;i++) { t->GetEntry(i);} system("date")
Wed Jan 5 22:03:41 CET 2011
Wed Jan 5 22:03:56 CET 2011
(const int)0
root [5] system("date");for(int i=0;i<1000000;i++) { t->GetEntry(i);} system("date")
Wed Jan 5 22:04:06 CET 2011
Wed Jan 5 22:04:14 CET 2011
(const int)0
root [6] system("date");for(int i=0;i<1000000;i++) { t->GetEntry(i);} system("date")
Wed Jan 5 22:04:16 CET 2011
Wed Jan 5 22:04:24 CET 2011
(const int)0
root [7] system("date");for(int i=0;i<1000000;i++) { t->GetEntry(i);} system("date")
Wed Jan 5 22:04:38 CET 2011
Wed Jan 5 22:04:47 CET 2011
(const int)0
root [8]
Well, this is confirming my observation: that the last 1mln entries take longer to cycle through than the first 1mln. Sure, in your case the difference is 1.5x versus my 10x, but that can depend on a combination of the tree’s complexity, disk speed and particulars of TTree::GetEntry()'s implementation.