I am using root 5.06/00 on windows XP. One type of TTree I use often has many entries (>10**8) and are made up of < 20 branches, each with a primitive data type (UShort_t,Float_t, Int_t, etc) Sometimes it is desirable to sample a few branches from these large TTrees - e.g. read every 200th entry for 3-4 branches and nothing else. What I’ve observed for a 1 GB file is that the first time I sample, it taks about 60 seconds, but only 13 sec of CPU time. If run the sampling program a second time, it takes about 13 seconds (down from 60), same as the cpu time. Obviously, parts of the file are getting cached during the first read and the second read benefits from that. This is fine if I plan to spin through the file multuple times (assuming the file cache is large enough) but sometimes I only want to read it once and move on to the next file.
What I found is that if I first read the root file (fread the file’s bytes with 10kB buffer and don’t do anything with them) and then run the sampling code described above, I get significant improvement: The first read (to populate the cache - I don’t look at the data) takes about 22 seconds while the second read still takes only 13 seconds for a total of 35 seconds of actual time, as compared with 60 seconds.
The file I have been testing with is 965 MB. (I have 2 copies that I switch between to avoid cache effects between tests) I doubt the entire root file fits in cache (plus I have root files several times larger than this) so there may be an opportunity to optimize, e.g. periodically freshen up the cache with buffered “pre-reads” during normal root access of the TTree.
My desktop machine is a Xeon 3GHz with 2GB RAM with an SATA hard drive. The files are local.
Comments? I know I’m a bit out of date with my root version… should I expect similar issues with later versions of root? What about 64 bit - presumably vista can have a larger file cache?