TFile Speed

broken · April 25, 2014, 1:49pm

Hi Everybody,

it’s a bit of an involved question, I’m afraid. I have an “analysis” that involves filling a reasonably large number (2000 per phase, 32 phases) of histograms (mostly TH1I) and extracting summary plots from them (mostly TProfile). I want to write these out to a TFile and have a class that creates a directory hierarchy, putting the histograms in the desired locations so that they can be easily browsed & parsed. For memory reasons, I fill the histograms for each of the phases sequentially, writing out the filled histograms at the end of each loop and freeing up the memory so it can be reused. This prevents the memory footprint of the process exceeding the physical memory on the PC. As I didn’t want the histograms to be created in the current working directory, I was initially opening the TFile for "UPDATE"ing only at the point at which the histograms were ready to be written out then closing it (I later disciovered TH1::AddDirectory(kFALSE); - but that’s another story). I found that this was slow, taking some 10s of minutes per phase. Once I discovered this AddDirectory switch, I stopped opening and closing the TFile at the end and found that the Write() commands completed quickly for each phase, but that there was then a MASSIVE wait at the end of the process when closing the TFile (note that the processor is relatively inactive during this period) - presumably the cumulative wait from all the phases. The memory footprint of the process was also significantly bigger. I tried calling TFile::Flush() after Write(), but this doesn’t actually change this behaviour - there’s still the long wait at the end. So I’ve gone back to opening and closing the TFile at the end of each phase loop.

There are several things here:

why do I see this behaviour? Presumably, I’m doing something that root or the TFile wasn’t designed for, or at least isn’t well optimised for.
Can I improve things? I’m afraid that the number of histograms is only going to go up… I currently “speed” things up (well, actually, I’m just reducing the total time required to run the process) by spawning a new thread and letting this execute the calls to Write() whilst the oringinal thread fills the next set of histograms. The second thread opens and closes the TFile and and takes responsibility for freeing up the memory. I see a slightly bigger memory footprint, but the time taken to complete is cut almost in half - an important gain when the initial time to completion was a little over 6 hours.

There was another thread where a similar question was asked [url=https://root-forum.cern.ch/t/very-slow-output-writing-multiple-directories-and-histograms/640/1 but this was almost 10 years ago, and there isn’t really a resolution to the question… I can try to produce a minimal working example if it’s necessary.

Any input would be greatly appreciated.
Best wishes,
Matt

brun · April 25, 2014, 3:23pm

Could you post (or store somewhere on a public site) the resulting histogram file with your 64000 (2000x32) histograms ?
and possibly prepare the smallest possible subset of your program reproducing the problem ?

Rene

brun · April 25, 2014, 4:14pm

I have written a small test program simulating your example with 32 directories, each one with 2000 profile histograms (100 bins each) filled randomly. This program executes in 8 seconds on my laptop and produces a 123 MBytes file. (see my test program below) that you can execute with

root -b -q "filespeed.C+"

So, either you have memory consuming TH2, TH3 or like objects or profiles with a large number of bins,
or you have something badly wrong in the way you store/manage your histograms.

Rene

[code]#include “TFile.h”
#include “TH2.h”
#include “TProfile.h”
#include “TRandom3.h”

void filespeed() {
TH1::AddDirectory(0);
TRandom3 rand(0);
const Int_t nhist = 2000;
TH1 *h[nhist];
Int_t ih;
for (ih=0;ih<nhist;ih++) h[ih] = new TProfile(Form(“h%d”,ih),“test”,100,0,100);
TFile f(“filespeed.root”,“recreate”);
for (Int_t phase=0;phase<32;phase++) {
printf(“processing phase%d\n”,phase);
TDirectory *dir = f.mkdir(Form(“phase%d”,phase));
dir->cd();
for (Int_t i=0;i<100000;i++) {
Int_t ihn = i%nhist;
double r1 = rand.Uniform(0,100);
double r2 = rand.Rndm();
h[ihn]->Fill(r1,r2);
}
for (ih=0;ih<nhist;ih++) h[ih]->Write();
f.cd();
}
f.Close();
}[/code]

broken · April 26, 2014, 12:47pm

Hi René,

thanks very much indeed for the prompt replies. I’ve written some dummy code that replicates more or less what I’m doing in the “analysis” and what I see in the real programme. I don’t fill any summary plots in this code, though. Please find it attached in the file many_histograms.cc. I build it with:

g++ -c -I`root-config --incdir` many_histograms.cc -o many_histograms.o
g++  `root-config --libs` -o many_histograms many_histograms.o

At home (so I think my PC at home is more performant than the one in the lab, but, ok), I see the following function call times for (all of the) TH1::Write()s, TFile::Flush() and TFile::Close():

so this programme is taking ~5 minutes to close the TFile for a single simulated phase.

Am I being unreasonable in my expectations? (Perfectly possible…) Is there any way to speed this up?

Many thanks,
Matt

PS: I see the same performance as you report for the filespeed.C example you kindly wrote and posted.
many_histograms.cc (8.62 KB)

brun · April 26, 2014, 2:59pm

running your program on mac book pro, I obtain

time to write = 4.77812s time to flush = 0.0172079s time to close = 1.42804s time to delete = 6.60419e-05s int test::analysis::Analyse(): Main loop done Analyse() completed

What is your machine/OS/ROOT_version ?
Howmuch DRAM do you have?

Rene

broken · April 26, 2014, 6:47pm

Hi René,

thanks again for such a prompt reply - there’s a staggering difference in performance between our machines. I’m running CEntOS 6 on a quad core i5-2300 CPU (2.80GHz) with 8GB of RAM. The hard drive is a 2TB SATA disk. The g++ version is 4.4.7 and I compiled root version 5.34/00.

My PC is not particularly slow, do you understand where this difference comes from?

Best wishes,
Matt

brun · April 26, 2014, 8:19pm

How long does it take to copy eg the output file histos.root (about 38Mbytes) ?
time cp histos.root histos2.root

It looks like a performance problem with your disk

Rene

Wile_E_Coyote · April 26, 2014, 8:56pm

I confirm that the problem exists.

I tried “many_histograms” on a Ubuntu 10.04.4 LTS / i686 with gcc 4.4.3 and ROOT 5.28 and 5.34/18 and, when I write to a local hard disk (I cannot test writing to a NFS disk now), I also get hundreds of seconds “time to close”. Several seconds after I see “Phase : 0, idx = 90000”, the “top” shows for “many_histograms” negligible CPU usage (below 1%), the process status “S” is ‘D’ = uninterruptible sleep, and I can see the hard disk led constantly “on”.
Of course, simple copy of the “histos.root” file “from” and “to” a local hard disk goes at about 40 MB/s.

Then I tried it on a CentOS 6.5 (Final) / x86_64 with gcc 4.4.7 and ROOT 5.34/17. When the “histos.root” is written to a local disk, the “time to close” is short (below 2 s). However, when the “histos.root” is written to a NFS disk, it also takes hundreds of seconds “time to close”. Again, several seconds after I see “Phase : 0, idx = 90000”, the “top” shows for “many_histograms” negligible CPU usage (below 1%) and the process status “S” is ‘D’ = uninterruptible sleep.
Simple copy of the “histos.root” file over NFS goes at about 10 MB/s (100 Mb/s ethernet).

broken · April 27, 2014, 4:38am

Hi René, Wile E. Coyote,

it’s fantastic to have such fast feedback!

copying the file is quite fast for me:

[matt many_histos $] time cp histos.root histos2.root

real    0m0.030s
user    0m0.000s
sys     0m0.029s
[matt many_histos $]

so this must be something else, as Wile E. says. I’m never writing to an NFS disk, so perhaps a temporary solution for me would be to migrate to version 5.34/17?

Many thanks,
Matt

EDIT1: moving to 5.34/17 via:

   git clone http://root.cern.ch/git/root.git root-v5-34
   cd root-v5-34
   git checkout -b v5-34-17 v5-34-17
followed by 
   ./configure
   make -j5

then rebuilding many_histograms.cc against the new root libraries does not change the behaviour for me.

brun · April 27, 2014, 7:10am

Opening/closing files with NFS (or AFS) is a slow operation. The system has to flush the caches.
In your example, you have one open/close for each histogram.
Try to organize your program in a much more efficient way where you have only one open/close for each of your 32 sub-directories. (in the same way that I illustrated in my example)

Rene

broken · April 27, 2014, 8:08am

Hi René,

I’d like to be clear on a couple of things:

my code only ever writes out to a file on the local SATA disk. I never write out to a file on AFS or NFS or across any kind of network connection;
the code as it is written at present writes out all of the histograms for a given phase between a single TFile::Open() call and a single TFile::Close() call, i.e. in pseudocode:

TFile::Open(); 
loop writing out thousands of histograms for a single phase
TFile::Close();

;

the reasons I open and close the file each loop are:

a) I observe a long delay when calling TFile::Close() that appears additive; i.e. 5 minutes (say) after one phase, 10 minutes after 2 phases, … 2.5 hours after 32 phases. By handing off the Write() loop and Close() operation to a second thread, and closing the file after each phase loop, I can have this second thread do most of the waiting whilst I fill the histograms for subsequent phases. Whilst this level of multi-threaded programming is within my capabilities, I wouldn’t have gone to the trouble of adding it in if the TFile::Close() operation were as quick to complete as on your computer, for example.

b) the memory footprint appears additive, too. I am concious of the memory usage of my process and know that I don’t have enough RAM to hold all the histograms for all the phases in memory simultaneously. If I close the TFile after each phase, the memory footprint doesn’t grow, however, if I don’t do this close operation, it (the memory footprint) does grow. The reason I included the TFile::Flush() call was because I thought this would flush the buffer of data waiting to go on disk - it doesn’t appear to do this. I’ve perhaps misunderstood the purpose of TFile::Flush(), but this is a side issue that doesn’t concern the main behaviour I observe.

Many thanks again,
Matt

brun · April 27, 2014, 12:57pm

Philippe or/and Danilo will follow this problem next week.

Rene

Danilo · April 27, 2014, 4:04pm

Hi,

I am not able to follow this up today unfortunately, but I put it on the top of my list for Monday.

Cheers,
Danilo

pcanal · April 27, 2014, 4:55pm

Hi Matt,

I am a bit confused. Is the slowdown you observe when writing to a local disk or when writing to NFS mounted disk? In the example you upload, when setting to “PHASE_COUNT=10;”, for every phase I see ‘pixel histograms not found, creating new ones’ and I do not really see a slowdown. What am I missing?

Cheers,
Philippe.

broken · April 28, 2014, 6:55am

Hi Philippe,

what OS are you running on? I see the issue on Centos 6 (and Wile E. Coyote reported similar behaviour on Ubuntu 10.04.4 LTS). I’ve never run the code writing out to a NFS disk (or AFS or any other network disk), only ever to a local SATA disk. At the end of each phase, the programme reports some times for the execution of certain function calls, what are these for you?

Thanks,
Matt

pcanal · April 28, 2014, 7:00pm

Hi Matt,

I get:[code]time to write = 4.78327s
time to flush = 0.017668s
time to close = 0.990368s
time to delete = 0.0131619s

[code]

[quote]I’ve never run the code writing out to a NFS disk [/quote]My bad, I mis-read Wile E. Coyote’s post and inaccurately read that he was seeing the problem only on NFS …

I am running on MacOS … I will try again on a different platform.

Cheers,
Philippe.

Wile_E_Coyote · April 28, 2014, 7:07pm

Matt did not specify whether his CentOS 6 is a i686 or x86_64 machine and whether he uses an Intel or AMD CPU (both my test machines use Intel CPUs -> Ubuntu runs on Core2 Duo T9600 while CentOS runs on Xeon E5506).

broken · April 28, 2014, 7:46pm

Once machine is reported as:

Intel® Core™2 Duo CPU E8400 @ 3.00GHz

the other is:

Intel® Core™ i5-2300 CPU @ 2.80GHz

Thanks,
matt

Wile_E_Coyote · April 28, 2014, 7:49pm

An i686 (32-bit) or x86_64 (64-bit) system? … try “uname -m” and/or “uname -a”

My CentOS 6.5 (Final) says:
Linux My.Machine.My.Domain 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

broken · April 28, 2014, 7:53pm

Both are x86_64. I don’t think I have access to an i686 machine to try this on…

Thanks,
Matt