The problem is that TDirectory::GetObject is taking a very long time to read the object when there are a reasonably large number of bins.
Below is a simple macro which generates 1,000,000 events with a four-dimensional gaussian profile, writes them to a file and then reads them back out. Timing is done for the different stages, and here are the results I get with ROOT version ROOT 5.21/01 (trunk@24531, Jul 27 2008)
I wrote this script just to give an idea of the problem. When I first had it, I thought that it was truly stuck. My use case is different from this example. Life would be a lot easier if I could make a big THnSparse once, write it to disk, then use it repeatedly.
The problem is more noticeable with more dimensions and fewer bins. I’m guessing that this is fairly integral to whatever API does the reading, but I was wondering if there was any quick workaround? Judging from a couple of quick sessions in a debugger, most of the time seems to be spent in TExMap::FindElement().
#0 0xb7a27f53 in TExMap::FindElement () from /usr/lib/root/libCore.so.5.21
#1 0xb7a282f6 in TExMap::Add () from /usr/lib/root/libCore.so.5.21
#2 0xb7a28728 in TExMap::Streamer () from /usr/lib/root/libCore.so.5.21
#3 0xb70dba79 in TStreamerInfo::ReadBuffer<char**> () from /usr/lib/root/libRIO.so.5.21
#4 0xb704b192 in TBufferFile::ReadClassBuffer () from /usr/lib/root/libRIO.so.5.21
#5 0xb6d7e0a0 in THnSparse::Streamer () from /usr/lib/root/libHist.so.5.21
125 bins on each axis:
100%
-- Generation of events took:
Real time 0:00:02, CP time 2.420
Filesize is: 12450417
-- Writing file took
Real time 0:00:03, CP time 3.630
Loading object... number of bins = 125
-- Successful file load, took:
Real time 0:00:06, CP time 6.180
150 bins on each axis: (File is ~300kbyte larger than 125 bin case, but takes ~40 seconds longer to load!)
100%
-- Generation of events took:
Real time 0:00:02, CP time 2.410
Filesize is: 12783964
-- Writing file took
Real time 0:00:04, CP time 3.700
Loading object... number of bins = 150
-- Successful file load, took:
Real time 0:00:51, CP time 49.540
The macro:
[code]#include <THnSparse.h>
#include <TH1D.h>
#include <TFile.h>
#include <TRandom.h>
#include <TStopwatch.h>
#include
using std::cout;
using std::endl;
int slowLoad()
{
gRandom->SetSeed( 123456 ); // and he said ‘Let there be determinism’
const UInt_t b = 150;
Int_t bins[] = { b, b, b, b };
Double_t mins[] = { -5, -5, -5, -5 };
Double_t maxs[] = { 5, 5, 5, 5 };
Double_t Values[] = { 0, 0, 0, 0 };
THnSparseF *hist = new THnSparseF( "hist", "hist", 4, bins, mins, maxs );
TStopwatch s; s.Start();
for ( UInt_t i = 0; i < 1000000; i++ )
{
for ( UInt_t j = 0; j < 4; j++ )
Values[j] = gRandom->Gaus();
hist->Fill( Values );
if ( i % 10000 == 0 )
{
cout << (i / 10000) << "% \r";
cout.flush();
}
}
s.Stop();
cout << "100%" << endl << " -- Generation of events took:" << endl;
s.Print();
TFile *outputFile = new TFile( "temp.root", "recreate" );
s.Start();
cout << " Filesize is: " << hist->Write() << endl;
outputFile->Close();
s.Stop();
delete outputFile;
delete hist; hist = NULL;
cout << "-- Writing file took" << endl;
s.Print();
TFile *inputFile = new TFile( "temp.root" );
cout << "Loading object... number of bins = " << b << endl;
s.Start();
inputFile->GetObject( "hist", hist );
s.Stop();
cout << "-- Successful file load, took: " << endl;
s.Print();
hist->Projection(0)->Draw();
return 0;
}[/code]