TQueryResult

Wolf · April 30, 2008, 3:06pm

Hello,

simple question:
what exactly does TQueryResult->GetBytes() return? (it { returns fBytes }, but what does that mean?)

reason for the question:
I am processing the same data set (about 37G in size) with ROOT/PROOF with different number of workers. Now, GetBytes does not always return the same value, it seems to depend on the number of workers. Additionally, GetBytes returns something in the order of 58*10^9 which is 1.5 times more than 37G… Does ROOT cout bytes of uncompressed root files while the 37G on disk are compressed? Or do different workers need to read the same part of a file (which is then counted twice)? Shouldn’t the rate which is displayed in QueryResult->Print(“F”) be the “real” rate, i.e. bytes read from disk per second?

Thanks,
Wolf

ganis · April 30, 2008, 10:32pm

Hi,

TQueryResult::GetBytes is supposed to return the number of bytes read during processing.
Technically the information comes from TFile::GetBytesRead() on the workers, sum up by the master.
Some overlap may happen if the packets from the same files are assigned to different workers (the bytes needed by TFile::Init will be counted twice, but this is what you actually read, so it is not wrong).
Also, the byte count done by TFile is after decompression, so those are the bytes that you analyse, not that you read from disk.

This said, you should get the same number from different runs (modulo small differences due to additional TFile::Init calls).

How much are the difference that you observe?

Which ROOT version are you using?

G. Ganis

Wolf · May 2, 2008, 9:49am

Ok, thanks, that confirms my guess

I’ using ROOT 5.18.00a - the release that comes with the latest CMSSW.

The numbers of bytes vary between 57.1 GB (5 Workers) to 58.8 GB (45 Workers). What I find interesting is that 1, 2, 3 and 4 Workers read about 58.3 GB and 5 Workers read a lot less?! My expectation was to see a minimum @ 1 worker.

See the rates plot!

ganis · May 2, 2008, 7:26pm

Hi,
Ok, this needs to be understood.
Btw, are the results the same? Number of events, entries in histograms?
Could please explain better the cluster setup (how many machines, cores per machine, …) and how the data are distributed?
Could you create the stat tree for the extreme cases, i.e. 5 and ~40 workers?
To do that you should set

root [] proof->SetParameter("PROOF_StatsTrace", "")

before running the query and then save the result in a file using

$ROOTSYS/test/ProofBench/SavePerfInfo.C("StatTree.root")

Please, post the files in a public place, e.g. a web location.

Thanks,

G. Ganis

Wolf · May 5, 2008, 10:13am

The results are the same, at least all the gif files (histograms) that I save do not differ.

The test setup is: 5 machines, 8 cores per machine (Xeon E5345), 16GB RAM. The data is stored on a network filesystem (Lustre). The files can be accessed like local files - for example chain.Add("/path/to/lustre-filesystem/file.root")

To get my results, I’m doing the following:

Long_t slaves;
slaves = (Long_t)40;
gProof->SetParameter("PROOF_MaxSlavesPerNode", slaves);
gProof->SetParameter("PROOF_StatsTrace", "");
chain->Process("w2e");
.x $ROOTSYS/test/ProofBench/SavePerfInfo.C("StatTree40.root")
TQueryResult *qr = gProof->GetQueryResult(); qr->Print("F");
slaves = (Long_t)5;
gProof->SetParameter("PROOF_MaxSlavesPerNode", slaves);
chain->Process("w2e");
.x $ROOTSYS/test/ProofBench/SavePerfInfo.C("StatTree5.root")
qr = gProof->GetQueryResult(); qr->Print("F");

The parameter MaxSlavesPerNode does not change the number of slaves per Node but the total number of slaves (can be seen either with ‘top’ or by using the TDrawFeedback)

Before processing, I have set
gEnv->SetValue(“Proof.StatsTrace”,1);
gEnv->SetValue(“Proof.SlaveStatsTrace”,1);
gProof->SetParameter(“PROOF_StatsTrace”, “”);

Files:
40 workers
5 workers

Something completely different: The memory consumption of xrootd on the master is growing. Memory is never freed, not even if you close your ROOT/PROOF session. After reaching the system limit (4 GB?) it crashes and one has to restart xrootd. I don’t know the cause, sometimes it’s just a few kb more after running a selector - but I also saw the memory consumption grow quite fast, about 0.5 MB per second. It seems that it depends on whether or not you use TDrawFeedback (if PROOF_ProcTimeHist and PROOF_EventsHist are enabled, it memory consumption grows FAST).