Randomized access in a tree

Hi. I am having a problem performing a fast randomized access inside a tree.
This is roughly the sequence of the steps I take:

  ptree->SetMaxVirtualSize(2e+9);
  std::cout << "Starting to load baskets...";
  ptree->LoadBaskets(); // to speed up readout
  std::cout << "done\n";

  //Before doing all events, first sort the tree along the "time" leaf
  std::cout << "Building index...";
  ptree->BuildIndex("0","time");
  Long64_t* index=((TTreeIndex*)(ptree->GetTreeIndex()))->GetIndex(); //get the index
  std::cout << "done\n";

      for(int i=0;i<nentries;i++){ //cycle over the entries in the tree

	ptree->GetEntry(index[i]);
  // do stuff here

    }

Despite the fact that TTree::LoadBasket() supposedly loaded the whole tree into memory, the loop over the tree proceeds brutally slow, e.g. 300 entries/second! What am I doing wrong here??

p.s. as an additional info, the root file where the tree resides was produced by hadd-ing multiple root files together.

Hi,

LoadBasket read the compressed data into memory but does not decompress them. When moving random from basket to baskets you will still have to spend the cpu time to decompress each time you switch baskets.

[quote]p.s. as an additional info, the root file where the tree resides was produced by hadd-ing multiple root files together.[/quote]One thing you can do is to disable the compression on the output file (-f0) when running hadd.

Cheers,
Philippe.

Philippe, I tried that. Now my speed is 500 entries/sec, which is still abysmal.
Here’s another thing…the file I am trying to read in is now 1.6GBytes big (since it’s uncompressed).
When I run the procedure above, however, I can see root’s memory usage increase only to 218MBytes.
I’d think it would need all of 1.6GBytes to keep the tree in memory? Perhaps I am not reading the tree into memory?

Hi,

[quote]When I run the procedure above, however, I can see root’s memory usage increase only to 218MBytes.[/quote]Yes, you are correct this is not enough …

Is ptree the actually TTree object or a TChain?

Philippe.

This is what I get:

psi00:16:35:~/code/root/read/scripts>root
root [0] TFile *_file0 = TFile::Open("/raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root")
root [1] ptree
(class TTree*)0x1665c60

so it seems to be a TTree pointer…

Hi,

Which version of ROOT are you using? What does LoadBaskets returns?

Philippe.

Philippe, perhaps this will help –


root [6] ptree->LoadBaskets()
(Int_t)13238
root [7] ptree->Print()
******************************************************************************
*Tree    :ptree     : event data                                             *
*Entries : 11355713 : Total =      1640410672 bytes  File  Size = 1635286740 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Br    0 :event     : samples/i:rate/f:time/l:time_abs/D:cn/i:pattern/i:     *
*         | pattern_c/i:counter/i:trigCounter/i:min_i/i:min/i:E/F:Ec/F:Ecp/f:*
*         | Ped/F:PSD/F:PSDsm/F:Chisquare_v0/F:Chisquare_v1/F:Chisquare_v2/F:*
*         | Chisquare_v3/F:Chisquare_v5/F:Chisquare_v6/F:Chisquare_v7/F:     *
*         | Chisquare_v8/F:id/i                                              *
*Entries : 11355713 : Total  Size= 1275265184 bytes  File Size  = 1271874823 *
*Baskets :      479 : Basket Size=    2740224 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :tlv_header : beamEnergy/D:beamCurrent/D:gate/i:inputCounts/i:      *
*         | outputCounts/i:prescaleCounts/i                                  *
*Entries : 11355713 : Total  Size=  365145038 bytes  File Size  =  363393346 *
*Baskets :      135 : Basket Size=    2740224 bytes  Compression=   1.00     *
*............................................................................*

p.s. I am running root Version 5.32/00

Hi,

Due to an unfortunate deficiency, you need to set the virtual size to twice as much as you need. So in you case: ptree -> SetMaxVirtualSize( 4e+9 ); // Greater than 2* 1 640 410 672

Cheers,
Philippe.

PS. The good news :slight_smile:, is that the buffer are actually decompressed as soon as they are read (so you do not need to run hadd -f0 ).

Philippe, no change: I increased it to 4e+9, yet the speed is exactly the same, and the total usage is still only 218MBytes.

p.s. So the problem has lost its urgency: I changed my code to simply read in the desired variables into very long arrays, and then I run TMath::Sort on those arrays. This allows me to run through the data in 10seconds (instead of the 10hrs I projected :slight_smile:). But of course it would be great if we could resolve this problem, because a random
access of the tree is a highly useful capability…

Hi,

The deficiency is solved in revision 43859 of the trunk.

Cheers,
Philippe.

So, for those who do not have the repository checked out, is this going to be in the next release?

Thank you Philippe!

Hi,

Yes, it will be included in the next release (v5.34).

Cheers,
Philippe.

PS. However it is a bit odd that increasing the max virtual size did not work for you … which version of ROOT are you currently using?

All this work was done under version 5.32/00

Hi,

What is the result in you case of doing:TFile *_file0 = TFile::Open("/raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root"); ptree->SetMaxVirtualSize(4e+9); b = ptree->GetBranch("event"); b->LoadBaskets(); cout << b->GetListOfBaskets()->GetEntries() << '\n'; gROOT->ProcessLine("ptree->Dump(); > ptree.dump"); gROOT->ProcessLine(".! grep TotalBuffers ptree.dump"); b = ptree->GetBranch("tlv_header"); b->LoadBaskets(); cout << b->GetListOfBaskets()->GetEntries() << '\n'; gROOT->ProcessLine("ptree->Dump(); > ptree.dump"); gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");

Thanks,
Philippe.

Ahaaaaa!! Now I see my memory usage skyrocket to 1.6GBytes (the actual file size)! I bet my code will run faster now – I’ll check and let you know in 10min.

So what was the difference? Rather than do LoadBaskets() on ptree, I did it on the actual branch – is that it?

And, to answer your questions, here’s the output:


root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
479
root [5] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [6] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 2625134592          ! Total number of bytes in branch buffers

root [8] b = ptree->GetBranch("tlv_header");
root [9] b->LoadBaskets();
root [10] cout << b->GetListOfBaskets()->GetEntries() << '\n';
135
root [11] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [12] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 3364995072          ! Total number of bytes in branch buffers

It also appears that the total number of bytes in branch buffers is 2x of what it should be…

Yep, it’s running super fast now! So here’s a summary for getting fast random tree access to work:

a) use an uncompressed .root file
b) use TTree::SetMaxVirtualSize(X) where X is twice your file size in bytes.
c) get the particular branch you are interested in from the tree, and do LoadBaskets() on that. Check root’s memory usage, and confirm that it’s close to your file size.
d) do your randomized branch access, it should be very fast at this point.

Philippe, thanks again!

[quote]It also appears that the total number of bytes in branch buffers is 2x of what it should be…[/quote]Yes this is exactly the problem that I fixed :slight_smile:.

[quote]So here’s a summary for getting fast random tree access to work:[/quote]humm …

[quote]a) use an uncompressed .root file[/quote]A priori this is not needed (i.e. I thought it was the original cause of your slow down but it isn’t and the buffer when loaded via LoadBasket are decompressed only once (during LoadBaskets).

[quote]c) get the particular branch you are interested in from the tree, and do LoadBaskets() on that. Check root’s memory usage, and confirm that it’s close to your file size. [/quote]What I don’t yet understand is why in you case, it still fails when you do the TTree::GetEntry after setting the VirtualSize large enough!

[quote]So what was the difference? Rather than do LoadBaskets() on ptree, I did it on the actual branch – is that it?[/quote]No, it should not make a different. However when you do the LoadBasket individually the memory usage (i.e. fTotalBuffers grows ‘slower’).

The next thing to try is:TFile *_file0 = TFile::Open("/raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root"); ptree->SetMaxVirtualSize(4e+9); ptree->LoadBaskets(); b = ptree->GetBranch("event"); cout << b->GetListOfBaskets()->GetEntries() << '\n'; b = ptree->GetBranch("tlv_header"); cout << b->GetListOfBaskets()->GetEntries() << '\n'; gROOT->ProcessLine("ptree->Dump(); > ptree.dump"); gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");

Thanks,
Philippe.

Philippe, I have checked and can confirm that

a) if I do ptree->LoadBaskets() OR
b) if I use a compressed root file

the memory usage drops to 218MBytes and the tree access becomes slow again. Not sure why.

And here’s the output for your test:

root [1] ptree->SetMaxVirtualSize(4e+9);
root [2] ptree->LoadBaskets();
root [3] b = ptree->GetBranch("event");
root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
2
root [5] b = ptree->GetBranch("tlv_header");
root [6] cout << b->GetListOfBaskets()->GetEntries() << '\n';
1
root [7] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [8] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 36286046208         ! Total number of bytes in branch buffers

fTotalBuffers is clearly wrong. You can see root’s mem usage below:

psi00:11:59:~>ps -eo pid,pcpu,rss,user,args | grep pnpfuser | grep root.exe
16936 11.4 38376 pnpfuser /raid/sw/root/root_v5.32.00 16936 11.4 38376 pnpfuser /raid/sw/root/root_v5.32.00/bin/root.exe -splash -l /raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root

Here are two more tests:

a) one where I do LoadBaskets() on the branch (rather than the tree) form an UNcompressed file and
b) one where I do LoadBaskets() on the branch (rather than the tree) form an compressed file

test A:

Attaching file /raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root as _file0...
root [1] ptree->SetMaxVirtualSize(4e+9);
root [2] TBranch* b = ptree->GetBranch("event");
root [3] b->LoadBaskets()
(Int_t)479
root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
479
root [5] b = ptree->GetBranch("tlv_header");
root [6] cout << b->GetListOfBaskets()->GetEntries() << '\n';
0
root [7] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [8] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 2625134592          ! Total number of bytes in branch buffers

In fact ROOT’s memory usage is 1272856 kBytes.

test B:

Attaching file /raid/home/pnpfuser/data_vx511/root_all/DU_data/compressed/2012_04_03_compressed.root as _file0...
root [1] ptree->SetMaxVirtualSize(4e+9);
root [2] TBranch* b = ptree->GetBranch("event");
root [3] b->LoadBaskets()
(Int_t)29344
root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
3
root [5] b = ptree->GetBranch("tlv_header");
root [6] cout << b->GetListOfBaskets()->GetEntries() << '\n';
1
root [7] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [8] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 80417385728         ! Total number of bytes in branch buffers

Again, crazy numbers: ROOT’s memusage is only 37996 kBytes.

Hi,

Okay, this might be another manifestation of the problem. Is there a way for you to send me your compressed file so I can try to reproduce (and fix) the behavior you see (and/or verify the fix I have fixes it :slight_smile: ).

Thanks,
Philippe