I work with users at Purdue CMS Analysis Facility (Purdue AF), who would like to be able to accelerate ROOT with GPUs (specifically, RooFit’s EvalBackend functionality). RooFit operations take dozens of hours in some cases, and users are excited about the possibility of accelerating that using available GPU resources.
At Purdue AF, we manage software stacks via Conda, so an ideal solution would be to keep it that way, and somehow enable CUDA for ROOT installation in a given Conda environment.
What is the best way to achieve this? Could you please provide specific instructions for installing ROOT in such a way, if it is possible?
OS: AlmaLinux 8
ROOT version: the latest will do (6.30.06)
CUDA version: 12.2
Thanks for reaching out. This sounds like a great use case. Actually, we’d also be very interested in understanding more of the great work you do at Purdue to understand how to best support you: that’s perhaps for a private message at a later stage with which we can reach out to you perhaps?
To come back to your question, let me ask two things for me to understand better the context. I see in the JL session the cms cvmfs mount point is available:
Have you checked with the curator of the content of that repository whether CMS distributes a CUDA enabled root as part of the CMSSW releases?
The SFT group distributes software stacks, e.g. for individual analysers, ATLAS, LHCb and SWAN. Among the type of releases, CUDA flavoured stacks are made available (e.g. /cvmfs/sft.cern.ch/lcg/views/LCG_105_cuda/). These look very much like CMSSW externals, i.e. a few hundred packages coherently compiled and distributed. Have you considered mounting sft.cern.ch on your AF?
In general, we try to avoid using CVMFS distributions, as the setup scripts there tend to break existing environments by overwriting environment variables such as LD_LIBRARY_PATH. For this reason, most of our researchers are already used to not mixing CMSSW with Pythonic analysis workflows.
That said, I did try to use SFT builds, the only Alma8 build of latest ROOT that I found is this: /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.30.06/x86_64-almalinux8.9-gcc85-opt/bin/thisroot.sh
However, I cannot use pyROOT with it (I’m not sure if it is the problem of the build, or whether it interferes with ROOT that is already installed in our image):
[dkondra@purdue-af-1]$ source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.30.06/x86_64-almalinux8.9-gcc85-opt/bin/thisroot.sh
[dkondra@purdue-af-1]$ python3
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT as rt
cling (LLVM option parsing): for the --optimize-regalloc option: may only occur zero or one times!
What we are trying to achieve at Purdue AF is being able to use ROOT with CUDA within a given Conda environment (Jupyter kernel), such that the environment variables and dependencies in that kernel are respected. This has already been working well with ROOT installed from conda-forge, but there is no CUDA support there.
Please feel free to message me privately for further discussion on Mattermost or email.
Hi @kondratyevd! Very interesting project. I was thinking a little bit with @Danilo how one could use RooFit CUDA without rebuilding and repackaging ROOT.
Actually, nothing should be different with building ROOT with cuda=ON in the existing libraries. There is only an additional library that gets built, RooBatchCompute_CUDA:
I said nothing should be different instead of nothing is different because I realized I made a mistake in the RooFit likelihood evaluation code: in a premature optimization attempt, I only build some pure C++ code that is relevant for the CUDA evaluation if cuda=ON, even though it wouldn’t harm at all be build it always.
If you would make sure that except for the added shared library RooBatchCompute_CUDA.so, nothing needs to be changed, and I give you a simple recipe to build this library standalone, would that make things simpler for you? I would work on this for the 6.32.02 patch release then.
If it is just one library which I can build and link to specific ROOT installations in Conda environments, I think it should be enough. At the very least, I will be able to check it quickly and give feedback.
Looking forward to your recipe! Also, it would be great if you could mention which ROOT versions it should be compatible with. Thanks!
I have implemented now what I described, and the PR will be merged soon:
This change will be included in the next patch release 6.32.02, which will come out in 2 weeks. I think once the release is out, you can try out the instructions in the PR description!
ROOT 6.32.02 is now available on the upstream conda channels. With the recipe provided by @jonas you should be able to build the RooFit batch compute library for GPU fitting against the conda installation of ROOT. Let me know if you try this out how it goes and we can followup.
Now that ROOT 6.32.02 is installable with conda, I am trying to build RooFit BatchCompute + CUDA library on top of the conda installation.
Here is roughly what I am doing:
clone roofit/batchcompute from the ROOT repo
mkdir build; cd build
cmake -DCMAKE_MODULE_PATH=<path to cmake modules> -Dcuda=ON .. – this will use roofit/batchcompute/CMakeLists.txt, which I slightly modified to fix some immediate errors, namely added these lines:
set(CMAKE_CUDA_ARCHITECTURES 60 70 75 80 86)
project(RooBatchCompute LANGUAGES CXX CUDA)
include(<path to cmake modules>/RootMacros.cmake)
make
However, when I run make, I keep running into errors:
First, I get fatal error: ROOT/RSpan.hxx: No such file or directory. To fix it, I locate this file and add the following to CMakeLists.txt:
target_include_directories(RooBatchCompute PUBLIC <path>/core/foundation/inc)
This resolves the error but leads me to next error: fatal error: RConfigure.h: No such file or directory
It seems like trying to build only roofit/batchcompute will force me to cherry-pick errors, so I must be doing something wrong.
Could you please help me with some instructions? Thanks!
Build the content of roofit/batchcompute in the ROOT repository, replacing the CMakeLists.txt file with the code listing below:
Build the project. You should now have a libRooBatchCompute_CUDA.so file. Make sure it’s in the LD_LIBRARY_PATH
Your fits should now work with EvalBackend("cuda") (see also the RooAbsPdf documentation about this)
# Adapt to your system
set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE)
set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++-13 CACHE STRING "" FORCE)
set (CMAKE_CUDA_COMPILER "/opt/cuda/bin/nvcc" CACHE STRING "" FORCE)
find_package(ROOT REQUIRED)
include(${ROOT_USE_FILE})
cmake_minimum_required(VERSION 3.14)
project(batchcompute-cuda LANGUAGES CUDA)
# in the src directory, put all files from roofit/batchcompute/src and roofit/batchcompute/res
add_library(RooBatchCompute_CUDA SHARED src/RooBatchCompute.cu src/ComputeFunctions.cu src/CudaInterface.cu)
target_include_directories(RooBatchCompute_CUDA PRIVATE src res)
target_compile_options(RooBatchCompute_CUDA PRIVATE -lineinfo --expt-relaxed-constexpr)
Thank you for the instructions.
I was able to run with the provided CMakeFile, with minor changes to the compiler paths:
set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE)
set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++ CACHE STRING "" FORCE)
set (CMAKE_CUDA_COMPILER "/usr/local/cuda-12.2/bin/nvcc" CACHE STRING "" FORCE)
However, I am still getting the following error during building:
In file included from /depot/cms/kernels/root632/include/ROOT/RConfig.hxx:23,
from /depot/cms/kernels/root632/include/RtypesCore.h:23,
from /depot/cms/kernels/root632/include/TError.h:34,
from /depot/cms/purdue-af/roofit-batchcompute/res/RooNaNPacker.h:18,
from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cxx:27,
from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cu:2:
/depot/cms/kernels/root632/include/RConfigure.h:30:4: warning: #warning "The C++ standard in this build does not match ROOT configuration (202002L); this might cause unexpected issues" [-Wcpp]
# warning "The C++ standard in this build does not match ROOT configuration (202002L); this might cause unexpected issues"
^~~~~~~
In file included from /depot/cms/kernels/root632/include/RtypesCore.h:23,
from /depot/cms/kernels/root632/include/TError.h:34,
from /depot/cms/purdue-af/roofit-batchcompute/res/RooNaNPacker.h:18,
from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cxx:27,
from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cu:2:
/depot/cms/kernels/root632/include/ROOT/RConfig.hxx:48:2: error: #error "ROOT requires support for C++17 or higher."
#error "ROOT requires support for C++17 or higher."
^~~~~
/depot/cms/kernels/root632/include/ROOT/RConfig.hxx:50:2: error: #error "Pass `-std=c++17` as compiler argument."
#error "Pass `-std=c++17` as compiler argument."
^~~~~
make[2]: *** [CMakeFiles/RooBatchCompute_CUDA.dir/build.make:92: CMakeFiles/RooBatchCompute_CUDA.dir/src/ComputeFunctions.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/RooBatchCompute_CUDA.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
I tried some workarounds like setting CMAKE_CXX_STANDARD 17, but it didn’t help. I have gcc version 12.3.0, and the cmake command is as follows: