I have a question about partial I/O in trees.
If I understand correctly, idea of partial I/O is to reduce an amount of data, which is read from tree, and, as a result, gain in speed of program execution. In ROOT tutorial I found no any performance benchmarks for partial I/O (in particular, I am interesting in partial read of tree), therefore I try to create some simple script to test this feature of ROOT trees.
What is it doing? It measures time of reading tree from file, when one or several branches are activated.
Tree is generated by script. Tree has 10 branches, each brunch contain fixed size array of doubles.
To exclude file caching I write a functions PurgeMemory(), which tries to allocate all physical memory and fill it. This seems to be, forces system to release all file cache buffers.
After first tests I find out, that reading speed drastically depends from branch buffers size (so called basket size). Therefore script performs test, using different buffers size. First time it generate tree, where buffer size is equal to branch data size (not a very good idea, but it works), next time buffer size doubles, and so on up to the limit when buffers sizes 512 times bigger then data in branch.
Scripts measures real time and CPU time of reading of such trees. It writes results to another small tree. ShowTestResults() function create from this tree 2D histogram, which shows dependency of time from number of activated branches and from relative size of branch buffer (in logarithmic scale).
Script performs two tests.
First time it generating trees with 1000000 events, each has 10 branches, 10 doubles (80 bytes) in each branch. Results can be seen in Tree_small_real.gif file.
Second time it generating trees with 10000 events, each has 10 branches, 1000 doubles (8000 bytes) in each branch. Results can be seen in Tree_large_real.gif file.
I used ROOT version 3.10/01, compiled under Debian, gcc 2.95.4. Script run in compiled mode, using ACLiC. My computer is Athlon 1800+ MX, 512 MB RAM. On my computer it runs about 2 hour mostly because of big time of tree generation and time delays between each test.
Results, that I see, confusing me. When I have small branch data size (only 80 bytes), I can use buffer (basket) size, which is 100 times bigger then my data. But in this situation there is no difference, if I read only 1 branch, or if I read all 10 branches.
From other side, when branch data size fairly big (8000 bytes), I not always able to use 10 time bigger buffer (basket) size. And again in such situation I gain practically nothing reading only 1 branch or reading all 10 branches.
I also look into CPU time. It gives nice results, that in all cases you can gain factor of 10, but for me much more interesting real time, when I sitting in front of computer display and waiting, when program is finished it’s job.
Can somebody explain, that I am doing wrong or, maybe, I miss something?
P.S. If somebody wants to run this script, it should set RAMSIZE constant to correct size of computer physical memory in MB, otherwise file caching will play significant role in all tests. Computer should also have about 1 GB free disk space.
PerfomanceTest.C (6.38 KB)