Randomized access in a tree

Philippe, no change: I increased it to 4e+9, yet the speed is exactly the same, and the total usage is still only 218MBytes.

p.s. So the problem has lost its urgency: I changed my code to simply read in the desired variables into very long arrays, and then I run TMath::Sort on those arrays. This allows me to run through the data in 10seconds (instead of the 10hrs I projected :slight_smile:). But of course it would be great if we could resolve this problem, because a random
access of the tree is a highly useful capability…

Hi,

The deficiency is solved in revision 43859 of the trunk.

Cheers,
Philippe.

So, for those who do not have the repository checked out, is this going to be in the next release?

Thank you Philippe!

Hi,

Yes, it will be included in the next release (v5.34).

Cheers,
Philippe.

PS. However it is a bit odd that increasing the max virtual size did not work for you … which version of ROOT are you currently using?

All this work was done under version 5.32/00

Hi,

What is the result in you case of doing:TFile *_file0 = TFile::Open("/raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root"); ptree->SetMaxVirtualSize(4e+9); b = ptree->GetBranch("event"); b->LoadBaskets(); cout << b->GetListOfBaskets()->GetEntries() << '\n'; gROOT->ProcessLine("ptree->Dump(); > ptree.dump"); gROOT->ProcessLine(".! grep TotalBuffers ptree.dump"); b = ptree->GetBranch("tlv_header"); b->LoadBaskets(); cout << b->GetListOfBaskets()->GetEntries() << '\n'; gROOT->ProcessLine("ptree->Dump(); > ptree.dump"); gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");

Thanks,
Philippe.

Ahaaaaa!! Now I see my memory usage skyrocket to 1.6GBytes (the actual file size)! I bet my code will run faster now – I’ll check and let you know in 10min.

So what was the difference? Rather than do LoadBaskets() on ptree, I did it on the actual branch – is that it?

And, to answer your questions, here’s the output:


root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
479
root [5] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [6] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 2625134592          ! Total number of bytes in branch buffers

root [8] b = ptree->GetBranch("tlv_header");
root [9] b->LoadBaskets();
root [10] cout << b->GetListOfBaskets()->GetEntries() << '\n';
135
root [11] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [12] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 3364995072          ! Total number of bytes in branch buffers

It also appears that the total number of bytes in branch buffers is 2x of what it should be…

Yep, it’s running super fast now! So here’s a summary for getting fast random tree access to work:

a) use an uncompressed .root file
b) use TTree::SetMaxVirtualSize(X) where X is twice your file size in bytes.
c) get the particular branch you are interested in from the tree, and do LoadBaskets() on that. Check root’s memory usage, and confirm that it’s close to your file size.
d) do your randomized branch access, it should be very fast at this point.

Philippe, thanks again!

[quote]It also appears that the total number of bytes in branch buffers is 2x of what it should be…[/quote]Yes this is exactly the problem that I fixed :slight_smile:.

[quote]So here’s a summary for getting fast random tree access to work:[/quote]humm …

[quote]a) use an uncompressed .root file[/quote]A priori this is not needed (i.e. I thought it was the original cause of your slow down but it isn’t and the buffer when loaded via LoadBasket are decompressed only once (during LoadBaskets).

[quote]c) get the particular branch you are interested in from the tree, and do LoadBaskets() on that. Check root’s memory usage, and confirm that it’s close to your file size. [/quote]What I don’t yet understand is why in you case, it still fails when you do the TTree::GetEntry after setting the VirtualSize large enough!

[quote]So what was the difference? Rather than do LoadBaskets() on ptree, I did it on the actual branch – is that it?[/quote]No, it should not make a different. However when you do the LoadBasket individually the memory usage (i.e. fTotalBuffers grows ‘slower’).

The next thing to try is:TFile *_file0 = TFile::Open("/raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root"); ptree->SetMaxVirtualSize(4e+9); ptree->LoadBaskets(); b = ptree->GetBranch("event"); cout << b->GetListOfBaskets()->GetEntries() << '\n'; b = ptree->GetBranch("tlv_header"); cout << b->GetListOfBaskets()->GetEntries() << '\n'; gROOT->ProcessLine("ptree->Dump(); > ptree.dump"); gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");

Thanks,
Philippe.

Philippe, I have checked and can confirm that

a) if I do ptree->LoadBaskets() OR
b) if I use a compressed root file

the memory usage drops to 218MBytes and the tree access becomes slow again. Not sure why.

And here’s the output for your test:

root [1] ptree->SetMaxVirtualSize(4e+9);
root [2] ptree->LoadBaskets();
root [3] b = ptree->GetBranch("event");
root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
2
root [5] b = ptree->GetBranch("tlv_header");
root [6] cout << b->GetListOfBaskets()->GetEntries() << '\n';
1
root [7] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [8] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 36286046208         ! Total number of bytes in branch buffers

fTotalBuffers is clearly wrong. You can see root’s mem usage below:

psi00:11:59:~>ps -eo pid,pcpu,rss,user,args | grep pnpfuser | grep root.exe
16936 11.4 38376 pnpfuser /raid/sw/root/root_v5.32.00 16936 11.4 38376 pnpfuser /raid/sw/root/root_v5.32.00/bin/root.exe -splash -l /raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root

Here are two more tests:

a) one where I do LoadBaskets() on the branch (rather than the tree) form an UNcompressed file and
b) one where I do LoadBaskets() on the branch (rather than the tree) form an compressed file

test A:

Attaching file /raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root as _file0...
root [1] ptree->SetMaxVirtualSize(4e+9);
root [2] TBranch* b = ptree->GetBranch("event");
root [3] b->LoadBaskets()
(Int_t)479
root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
479
root [5] b = ptree->GetBranch("tlv_header");
root [6] cout << b->GetListOfBaskets()->GetEntries() << '\n';
0
root [7] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [8] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 2625134592          ! Total number of bytes in branch buffers

In fact ROOT’s memory usage is 1272856 kBytes.

test B:

Attaching file /raid/home/pnpfuser/data_vx511/root_all/DU_data/compressed/2012_04_03_compressed.root as _file0...
root [1] ptree->SetMaxVirtualSize(4e+9);
root [2] TBranch* b = ptree->GetBranch("event");
root [3] b->LoadBaskets()
(Int_t)29344
root [4] cout << b->GetListOfBaskets()->GetEntries() << '\n';
3
root [5] b = ptree->GetBranch("tlv_header");
root [6] cout << b->GetListOfBaskets()->GetEntries() << '\n';
1
root [7] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [8] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 80417385728         ! Total number of bytes in branch buffers

Again, crazy numbers: ROOT’s memusage is only 37996 kBytes.

Hi,

Okay, this might be another manifestation of the problem. Is there a way for you to send me your compressed file so I can try to reproduce (and fix) the behavior you see (and/or verify the fix I have fixes it :slight_smile: ).

Thanks,
Philippe

Philippe, I’d love to, however the file is large – 570MBytes. Is there an ftp server on CERN where
I could upload it to?

p.s. I could also verify any potential fixes from you, if that’s what you are asking.

Ok, you can grab it from here, it should be ready in 5-10minutes (it’s getting uploaded to the server):

dl.dropbox.com/u/6095182/2012_04 … essed.root

Please let me know when you are done – I’ll need to delete it.

UPDATE: it’s there now, it took longer than I thought.

Hi,

Thanks. I was able to download the file and understood the last issue.

Turns out that TTree::LoadBaskets use the value given in a previous call to SetMaxVirtualSize only if being passed a value of 0 or less. The default value used when none is explicitly provided is 2e+9. So to make it work in you case you just need to do:ptree->LoadBaskets(4e+9);.

Cheers,
Philippe.

Ok, that fixed it – for uncompressed data.

For compressed data it still won’t read it in though. Here are the two tests:

test A:

psi00:17:11:~/code/root/read/scripts/multiplicity/archive>root $ROOTALL/DU_data/2012_04_03_uncompressed.root
root [0] 
Attaching file /raid/home/pnpfuser/data_vx511/root_all/DU_data/2012_04_03_uncompressed.root as _file0...
root [1] ptree->LoadBaskets(4e+9);
root [2] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [3] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 3364995072          ! Total number of bytes in branch buffers

So far so good…now, test B:

psi00:17:14:~/code/root/read/scripts/multiplicity/archive>root $ROOTALL/DU_data/compressed/2012_04_03_compressed.root 
root [0] 
Attaching file /raid/home/pnpfuser/data_vx511/root_all/DU_data/compressed/2012_04_03_compressed.root as _file0...
root [1] ptree->LoadBaskets(4e+9);
root [2] gROOT->ProcessLine("ptree->Dump(); > ptree.dump");
root [3] gROOT->ProcessLine(".! grep TotalBuffers ptree.dump");
fTotalBuffers                 2228673567232       ! Total number of bytes in branch buffers

(and, yes, the memory usage is tiny)

Hi,

Due to the data distribution in your files (some/many of the basket are not full), the deficiency lead to more than double counting the actual content. To get the good behavior with the deficent version of ROOT, for your files, you need to use a very large number (i.e. greater than what fTotalBuffers reports). For example:ptree->LoadBaskets(2328673567232LL);

Cheers,
Philippe.

Ok, that worked.

Hi,

Great, so the problem is really localized to what I found and is fixed :slight_smile:

Cheers,
Philippe.

Yes, thank you. Looking forward to the next release!