opened 05:38PM - 20 Feb 17 UTC
enhancement
Hi ROOT team,
I am opening this issue to document and discuss our plan to add… CUDA support to Cling. Maybe someone can open and link an issue on https://root.cern/bugs, I neither have a CERN nor an external account and the tracker is not public for people without registration :)
The discussion started after a post of mine in the cling-dev mailing list, where @Axel-Naumann picked me up and we spinned of a longer private discussion.
> On 07.11.2016 10:34, Huebl, Axel wrote:
> [...] I am trying to compile a simple CUDA program, which is supported by clang
> already:
>
> [...] internal include issues when starting cling via
> ./cling -x cuda --cuda-path=$CUDA_ROOT --cuda-gpu-arch=sm_35
-L$CUDA_ROOT/lib -lstdc++ -lcudart [...]
By now, @Axel-Naumann did already fix all issues during startup and we think we are now at the point where one could work on accessing clang's PTX emitter to generate PTX code and then pass it to the driver API.
```
$ cling -x cuda --cuda-gpu-arch=sm_35 -nocudainc
atexit not in Module!
at_quick_exit not in Module!
****************** CLING ******************
* Type C++ code and press enter to run it *
* Type .q to exit *
*******************************************
[cling]$
```
Clang currently translates CUDA code by linking the PTX code in a fat binary and passing it the CUDA driver during runtime, which generates SASS code (shader assembly) from it to execute. This is similar to what `nvcc` provides, besides that `nvcc` can additionally generate and link `SASS` code for a specific compute architecture directly, but that's not important here ([see this page for further details](http://llvm.org/docs/CompileCudaWithLLVM.html)).
From our discussions I understood, that cling already has similar functionality for e.g. [PowerPC](https://github.com/root-mirror/cling/blob/097a4e3fbd38b211709a8f5a2c5fb0bc5fe715e3/lib/Interpreter/IncrementalExecutor.cpp#L68) in place to target specific emitters and execute their assembler artifacts. Can you guide us how one could add the same functionality for PTX?
At [GCoE Dresden](http://www.gcoe-dresden.de), which is a collaboration of research groups in and around Dresden (Technical University, Max-Planck, Helmholtz-Zentrum Dresden-Rossendorf), we are currently discussing the possibilities for interactive simulations, RT profiling and tuning, teaching, rapid prototyping and interactive simulations and much more that one could get from a CUDA capable interpreter. Long story short: exciting possibilities!
From what I know about the routines in ROOT, there is no wide spread manycore or GPU acceleration available up to now. Adding CUDA support from cling will provide native CUDA support in your framework which is probably something that could be of interest from your side. Maybe you also want to build on that and directly add general manycore support [in a more performance portable and abstract way](https://github.com/ComputationalRadiationPhysics/alpaka), a topic on which we have experience, too.
We have currently one interested student that could work on the topic and any support and docs would be greatly appreciated. Two other groups from TU Dresden and Max-Planck also seemed interested and we might be able to contribute further resources (although that is not up to me). Due to our GCoE we also have a fruitful collaboration with Nvidia, which might be necessary, too.
CCing @harrism you might be [interested](https://twitter.com/harrism/status/795400414309023744) in this thread.