PROOF user guide

luigipertoldi · December 16, 2016, 4:56pm

Hi everyone,

I’m starting to approach the PROOF world by looking for a good guide. I see there’s the drupal book and I guess that’s the only available format so I have two questions:

Is there a printable version of the guide?
It’s only me or the links to the images are all dead? (e.g. https://root.cern.ch/event-level-parallelism)

Thanks

luigipertoldi · December 17, 2016, 11:12am

Also a lot of links are dead…

rperez · December 20, 2016, 11:38am

Good luck with it… the documentation and examples provided for PROOF are really scarce and have poor quality. I have been struggling to make it work for a while now and I have just managed to make it work when the amount of outputs is small.

My recommendation is to not waste time learning PROOF unless the amount of data you need to process is really huge, so you could potentially save lots of time. If not you are just going to lose more time trying making it work than what you will actually save. It is really a not user-friendly functionality and I think that a better solution is to run the same program in different “terminals” and then manually merge the outputs.

It took about an hour to analyse my data without using PROOF. I have 8 cores, so you would think that it should take around 8 minutes when using it. The truth is that it takes around 20minutes if it does work, plus you cannot use the computer during that time, because it gets saturated when the number of histograms is elevated.

Cheers,

Ricardo

ganis · December 20, 2016, 4:57pm

Dear Luigi Pertoldi,

I am sorry for the broken links, I should have fixed them now.
Unfortunately there is no PDF of the PROOF Drupal book; I have asked for the installation of the Drupal module to produce it but most likely nothing will happen before new year.

This said, and also in relation to the other comment in this thread, we are aware of the limitations of the PROOF interface and this is the reason why we started to develop a new interface for multiprocessing on multicore machine allowing much more flexibility in the definition of the tasks and of the functions to be applied. Proper documentation is scheduled for next year; however, you can have a look at the existing tutorials at root.cern.ch/doc/master/group__ … icore.html, in particular the ones starting with mp_ .

Coming back to PROOF-Lite, you should be aware that the limitation or bottleneck while processing a reasonable amount of data (larger than the RAM) is never CPU but the I/O from the disk. Standard HDD are able to serve efficiently 2-4 workers, SSD a bit more, but to get a speed-up of 8 with 8 cores you need a very fast storage system, or to put the data in memory.

The other thing to remember is that the merging phase maybe the bottleneck if the output is large. There is no magic in this. PROOF has some ways to handle large outputs (see root.cern.ch/handling-outputs) via files, which come to being similar to what is done when running separate jobs on the machine and doing the merging manually.

Hope it helps,
G Ganis

luigipertoldi · December 20, 2016, 6:09pm

Thanks so much!