VIVE L’AMOUR!
as far as I am aware, the newest Nvidia Fermi / CUDA cards are able to fully work in “dual precision” on something like 500 of cores in parallel and are not that expensive.
Dashing, no?
I’ve been thinking that maybe that would be interesting as a PROOF “virtual PROOF-Worker cluster” target (the “virtual PROOF_Master” could be the main CPU of the machine that has the Fermi card build-in, or maybe even one of the available Fermi cores).
I have been thinking a bit on how to utilise GPUs in ntuple analysis, the fundamental problem is the I/O limitation. For CUDA/OpenCL to make any sense, the input data has to be copied to the GPU memory, so unless you do some fairly heavy computations on a small data set, it doesn’t make any sense. In my work the I/O performance vastly outweighs the CPU time. So until someone figures out how to have a continuous stream of data going through the GPUs rather than moving one set of data in, and then out, all the time, I can’t see it as a HEP analysis tool :-/
But, if you can prove me wrong, I’ll be glad to spend some time giving it a shot
It should be possible to painlessly gain some performance using Thrust instead of STL. I have some intense data manipulations and histogram fillings where I/O is not a limiting factor, but CPU is. Though this will require modifications of a lot of ROOT code anyway with a questionable benefit, GPU <-> system memory exchange may become a new bottleneck.