PROOF and speed

joa · January 24, 2007, 10:52pm

Hi all,

I’m trying to find out if PROOF is something I should invest a bit of time in
so I made a simple test. I have a dualcore 64bit intel machine, running
scientific linux 4.3, and root 5.13/04. So I set up a local proof system (Seems
to work) and made a simple tree that I filled and analysed. This is faster on
one processor without proof than on two with proof… I didn’t expect a factor
of two in speed but more than one I actually thought I should get.

With proof

[jljungva@ mycomputer]$xrootd -c xpd.cf -b -l /tmp/example1.log
[jljungva@ mycomputer]$root -l
root [0] proof = TProof::Open(“localhost”)
Starting master: opening connection …
Starting master: OK
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)
(class TVirtualProof*)0x1702410
root [1] TChain achain(“PROOFtestTREE”)
root [2] achain.Add("/home/jljungva/localhome/prooftest/testtree.root")
(Int_t)1
root [3] achain.SetProof()
root [4] achain.Process(“PROOFtestSelector.C”)
Looking up for exact location of files: OK (1 files)
Validating files: OK (1 files)
Master-0: grand total: sent 4 objects, size: 51914908 bytes
Time processing: 107.953
Time writing result to disk: 6.72456
(Long64_t)0
root [5] .q

Without proof
[jljungva@mycmputer prooftest]$root -l
root [0] TChain achain(“PROOFtestTREE”)
root [1] achain.Add("/home/jljungva/localhome/prooftest/testtree.root")
(Int_t)1
root [2] achain.Process(“PROOFtestSelector.C”)
This slave filled with 6.41991e+06
Time processing: 71.5485
Time writing result to disk: 6.44897
(Long64_t)0
root [3] .q

Faster??? Can’t be right… So what am I doing wrong? I include a tarball
with scripts to produce my testfile and the selector etc. and the xpd.cf file
for the xrootd.

cheers

Joa
prooftest.tar.gz (3.17 KB)

ganis · January 31, 2007, 8:47am

Dear Joa,

Unfortunately PROOF is not yet optimized for local processing, as it has been designed for clusters of independent machines. This non-optimization shows up at its best for extremely I/O bound analysis, which happens to be your case.
For these kind of analysis, having an additional CPU does not typically help, unless the input data are spread over more than one disk.
What I would expect for your setup is a similar processing time for local chain and PROOF, with PROOF slightly slower because of the framework overhead.

However, that’s not really what you observe. Why is that? We have found two main reasons.

the framework overhead gets amplified for extremely CPU-light events. By looking at your case we have localized and implemented an optimization for some job-control operation. This is already available in CVS. Please try the CVS head, if you can.
in your example there is a relatively large output, which, in the PROOF framework, requires additional transfers (worker->master->client) not needed (and not done) while processing local chains. If you leave fOutput empty (or filled just with the histos) you should already see a difference. Of course, having an empty output list is not really a solution. We are working to a version of PROOF optimized for local usage, without daemons and with optimized transfer of information; we hope to have a first version of this ready for the next development release mid February.

Of course, as the analysis becomes more and more CPU bound, these worsening factors should decrease and eventually you should start seeing the benefits of having a second CPU.

Hope this helps.

G. Ganis

joa · February 1, 2007, 9:42am

Hi,

thanks for the anser. I made a small change in my testscript and added some pointless loops and some random number generation. This increases the “computaion” burden while the I/O remains the same, and now proof behaves pretty much as I would expect it to, i.e., faster with two CPUs than wth one. In short, I feel I understand how to get PROOF running and I can try it under more realistic situations.

I have not had the time to test the CVS version yet.

cheers

Joa