Installing ROOT with CUDA support in a Conda environment

kondratyevd · May 3, 2024, 5:48pm

Hi,

I work with users at Purdue CMS Analysis Facility (Purdue AF), who would like to be able to accelerate ROOT with GPUs (specifically, RooFit’s EvalBackend functionality). RooFit operations take dozens of hours in some cases, and users are excited about the possibility of accelerating that using available GPU resources.

At Purdue AF, we manage software stacks via Conda, so an ideal solution would be to keep it that way, and somehow enable CUDA for ROOT installation in a given Conda environment.

What is the best way to achieve this? Could you please provide specific instructions for installing ROOT in such a way, if it is possible?

OS: AlmaLinux 8
ROOT version: the latest will do (6.30.06)
CUDA version: 12.2

Thank you!

Dmitry

Danilo · May 4, 2024, 9:34am

Hi Dmitry,

Thanks for reaching out. This sounds like a great use case. Actually, we’d also be very interested in understanding more of the great work you do at Purdue to understand how to best support you: that’s perhaps for a private message at a later stage with which we can reach out to you perhaps?

To come back to your question, let me ask two things for me to understand better the context. I see in the JL session the cms cvmfs mount point is available:

Have you checked with the curator of the content of that repository whether CMS distributes a CUDA enabled root as part of the CMSSW releases?
The SFT group distributes software stacks, e.g. for individual analysers, ATLAS, LHCb and SWAN. Among the type of releases, CUDA flavoured stacks are made available (e.g. /cvmfs/sft.cern.ch/lcg/views/LCG_105_cuda/). These look very much like CMSSW externals, i.e. a few hundred packages coherently compiled and distributed. Have you considered mounting sft.cern.ch on your AF?

Cheers and thanks again for the interesting post.

Danilo

kondratyevd · May 8, 2024, 6:51pm

Hi Danilo,

Thank you very much for quick reply!

In general, we try to avoid using CVMFS distributions, as the setup scripts there tend to break existing environments by overwriting environment variables such as LD_LIBRARY_PATH. For this reason, most of our researchers are already used to not mixing CMSSW with Pythonic analysis workflows.

That said, I did try to use SFT builds, the only Alma8 build of latest ROOT that I found is this:
/cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.30.06/x86_64-almalinux8.9-gcc85-opt/bin/thisroot.sh
However, I cannot use pyROOT with it (I’m not sure if it is the problem of the build, or whether it interferes with ROOT that is already installed in our image):

[dkondra@purdue-af-1]$ source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.30.06/x86_64-almalinux8.9-gcc85-opt/bin/thisroot.sh
[dkondra@purdue-af-1]$ python3
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT as rt
cling (LLVM option parsing): for the --optimize-regalloc option: may only occur zero or one times!

What we are trying to achieve at Purdue AF is being able to use ROOT with CUDA within a given Conda environment (Jupyter kernel), such that the environment variables and dependencies in that kernel are respected. This has already been working well with ROOT installed from conda-forge, but there is no CUDA support there.

Please feel free to message me privately for further discussion on Mattermost or email.

Cheers,
Dmitry

Danilo · May 21, 2024, 6:30pm

Hi Dmitry,

Let’s move the discussion somewhere (I am contacting you now) and then we may decide to post the solution we find here.

Cheers,
D

jonas · May 27, 2024, 1:01pm

Hi @kondratyevd! Very interesting project. I was thinking a little bit with @Danilo how one could use RooFit CUDA without rebuilding and repackaging ROOT.

Actually, nothing should be different with building ROOT with cuda=ON in the existing libraries. There is only an additional library that gets built, RooBatchCompute_CUDA:

github.com

root-project/root/blob/master/roofit/batchcompute/CMakeLists.txt#L71


      
            # AVX512 is only supported in gcc 6+
            # We focus on AVX512 capable processors that support at least the skylake-avx512 instruction sets.
            if(NOT (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") OR CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 6)
              ROOT_LINKER_LIBRARY(RooBatchCompute_AVX512  src/RooBatchCompute.cxx src/ComputeFunctions.cxx TYPE SHARED DEPENDENCIES RooBatchCompute)
              target_compile_options(RooBatchCompute_AVX512  PRIVATE ${common-flags} -march=skylake-avx512 -DRF_ARCH=AVX512)
            endif()
          
          endif() # vector versions of library
          
          if (cuda)
            ROOT_LINKER_LIBRARY(RooBatchCompute_CUDA  src/RooBatchCompute.cu src/ComputeFunctions.cu TYPE SHARED DEPENDENCIES RooBatchCompute)
            target_compile_options(RooBatchCompute_CUDA  PRIVATE -DRF_ARCH=CUDA -lineinfo --expt-relaxed-constexpr)
          endif()

I said nothing should be different instead of nothing is different because I realized I made a mistake in the RooFit likelihood evaluation code: in a premature optimization attempt, I only build some pure C++ code that is relevant for the CUDA evaluation if cuda=ON, even though it wouldn’t harm at all be build it always.

If you would make sure that except for the added shared library RooBatchCompute_CUDA.so, nothing needs to be changed, and I give you a simple recipe to build this library standalone, would that make things simpler for you? I would work on this for the 6.32.02 patch release then.

Cheers,
Jonas

kondratyevd · May 28, 2024, 1:23pm

Hi @Jonas,

Thank you very much for looking into this!

If it is just one library which I can build and link to specific ROOT installations in Conda environments, I think it should be enough. At the very least, I will be able to check it quickly and give feedback.

Looking forward to your recipe! Also, it would be great if you could mention which ROOT versions it should be compatible with. Thanks!

Cheers,
Dmitry

jonas · June 6, 2024, 3:10pm

Hi @kondratyevd!

I have implemented now what I described, and the PR will be merged soon:

github.com/root-project/root

[RF] Make it possible to build RooFit CUDA evaluation library standalone to use in existing ROOT release with `cuda=OFF`

root-project:master ← guitargeek:roofit_cuda_2

opened 02:49PM - 06 Jun 24 UTC

guitargeek

+529 -723

This PR is a follow-up on https://github.com/root-project/root/pull/15746. Ma…ke it possible to build RooFit CUDA evaluation library standalone to use in existing ROOT release, as discussed on the forum: https://root-forum.cern.ch/t/installing-root-with-cuda-support-in-a-conda-environment/59208/5 To try it out: * Create a new CMake project with the CMakeLists.txt file below * In the `src` directory, put all files from `roofit/batchcompute/src` and `roofit/batchcompute/res` from the ROOT repo * Build the project. You should now have a `libRooBatchCompute_CUDA.so` file. Make sure it's in the `LD_LIBRARY_PATH` * Your fits should now work with `EvalBackend("cuda")` (see also the [RooAbsPdf](https://root.cern.ch/doc/master/classRooAbsPdf.html#a24b1afec4fd149e08967eac4285800de) documentation about this) ```cmake # Adapt to your system set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE) set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++-13 CACHE STRING "" FORCE) set (CMAKE_CUDA_COMPILER "/opt/cuda/bin/nvcc" CACHE STRING "" FORCE) find_package(ROOT REQUIRED) include(${ROOT_USE_FILE}) cmake_minimum_required(VERSION 3.14) project(batchcompute-cuda LANGUAGES CUDA) # in the src directory, put all files from roofit/batchcompute/src and roofit/batchcompute/res add_library(RooBatchCompute_CUDA SHARED src/RooBatchCompute.cu src/ComputeFunctions.cu src/CudaInterface.cu) target_include_directories(RooBatchCompute_CUDA PRIVATE src) target_compile_options(RooBatchCompute_CUDA PRIVATE -lineinfo --expt-relaxed-constexpr) ```

This change will be included in the next patch release 6.32.02, which will come out in 2 weeks. I think once the release is out, you can try out the instructions in the PR description!

system · June 20, 2024, 3:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

vpadulan · July 9, 2024, 4:19pm

Dear @kondratyevd ,

ROOT 6.32.02 is now available on the upstream conda channels. With the recipe provided by @jonas you should be able to build the RooFit batch compute library for GPU fitting against the conda installation of ROOT. Let me know if you try this out how it goes and we can followup.

Cheers,
Vincenzo

kondratyevd · July 25, 2024, 4:26pm

Thanks @vpadulan !

Now that ROOT 6.32.02 is installable with conda, I am trying to build RooFit BatchCompute + CUDA library on top of the conda installation.

Here is roughly what I am doing:

clone roofit/batchcompute from the ROOT repo
mkdir build; cd build
cmake -DCMAKE_MODULE_PATH=<path to cmake modules> -Dcuda=ON .. – this will use roofit/batchcompute/CMakeLists.txt, which I slightly modified to fix some immediate errors, namely added these lines:

set(CMAKE_CUDA_ARCHITECTURES 60 70 75 80 86)
project(RooBatchCompute LANGUAGES CXX CUDA)
include(<path to cmake modules>/RootMacros.cmake)

make

However, when I run make, I keep running into errors:

First, I get fatal error: ROOT/RSpan.hxx: No such file or directory. To fix it, I locate this file and add the following to CMakeLists.txt:

target_include_directories(RooBatchCompute PUBLIC <path>/core/foundation/inc)

This resolves the error but leads me to next error: fatal error: RConfigure.h: No such file or directory

It seems like trying to build only roofit/batchcompute will force me to cherry-pick errors, so I must be doing something wrong.

Could you please help me with some instructions? Thanks!

kondratyevd · August 13, 2024, 3:40pm

Hi @jonas @vpadulan @Danilo – bumping this up, do you have any updates on the instructions? Thanks!

jonas · August 13, 2024, 3:58pm

Hello! Sorry I only see now that the instructions were not clear.

Have you tried replacing the roofit/batchcompute/CMakeLists.txt with exactly the content I listed in the PR?

github.com/root-project/root

[RF] Make it possible to build RooFit CUDA evaluation library standalone to use in existing ROOT release with `cuda=OFF`

root-project:master ← guitargeek:roofit_cuda_2

opened 02:49PM - 06 Jun 24 UTC

guitargeek

+542 -760

This PR is a follow-up on https://github.com/root-project/root/pull/15746. Ma…ke it possible to build RooFit CUDA evaluation library standalone to use in existing ROOT release, as discussed on the forum: https://root-forum.cern.ch/t/installing-root-with-cuda-support-in-a-conda-environment/59208/5 To try it out: * Create a new CMake project with the CMakeLists.txt file below * In the `src` directory, put all files from `roofit/batchcompute/src` and `roofit/batchcompute/res` from the ROOT repo * Build the project. You should now have a `libRooBatchCompute_CUDA.so` file. Make sure it's in the `LD_LIBRARY_PATH` * Your fits should now work with `EvalBackend("cuda")` (see also the [RooAbsPdf](https://root.cern.ch/doc/master/classRooAbsPdf.html#a24b1afec4fd149e08967eac4285800de) documentation about this) ```cmake # Adapt to your system set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE) set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++-13 CACHE STRING "" FORCE) set (CMAKE_CUDA_COMPILER "/opt/cuda/bin/nvcc" CACHE STRING "" FORCE) find_package(ROOT REQUIRED) include(${ROOT_USE_FILE}) cmake_minimum_required(VERSION 3.14) project(batchcompute-cuda LANGUAGES CUDA) # in the src directory, put all files from roofit/batchcompute/src and roofit/batchcompute/res add_library(RooBatchCompute_CUDA SHARED src/RooBatchCompute.cu src/ComputeFunctions.cu src/CudaInterface.cu) target_include_directories(RooBatchCompute_CUDA PRIVATE src) target_compile_options(RooBatchCompute_CUDA PRIVATE -lineinfo --expt-relaxed-constexpr) ```

This should not have the errors you cite, thanks to the find_package(ROOT REQUIRED).

I’ll also test if it still works. I hope it works for you too!

Cheers,
Jonas

jonas · August 13, 2024, 4:18pm

I have updated the instructions now.

To try it out:

Build the content of roofit/batchcompute in the ROOT repository, replacing the CMakeLists.txt file with the code listing below:
Build the project. You should now have a libRooBatchCompute_CUDA.so file. Make sure it’s in the LD_LIBRARY_PATH
Your fits should now work with EvalBackend("cuda") (see also the RooAbsPdf documentation about this)

# Adapt to your system
set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE)
set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++-13 CACHE STRING "" FORCE)
set (CMAKE_CUDA_COMPILER "/opt/cuda/bin/nvcc" CACHE STRING "" FORCE)

find_package(ROOT REQUIRED)
include(${ROOT_USE_FILE})

cmake_minimum_required(VERSION 3.14)
project(batchcompute-cuda LANGUAGES CUDA)

# in the src directory, put all files from roofit/batchcompute/src and roofit/batchcompute/res
add_library(RooBatchCompute_CUDA SHARED src/RooBatchCompute.cu src/ComputeFunctions.cu src/CudaInterface.cu)
target_include_directories(RooBatchCompute_CUDA PRIVATE src res)

target_compile_options(RooBatchCompute_CUDA  PRIVATE -lineinfo --expt-relaxed-constexpr)

kondratyevd · August 15, 2024, 8:56pm

Hi @Jonas,

Thank you for the instructions.
I was able to run with the provided CMakeFile, with minor changes to the compiler paths:

set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE)
set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++ CACHE STRING "" FORCE)
set (CMAKE_CUDA_COMPILER "/usr/local/cuda-12.2/bin/nvcc" CACHE STRING "" FORCE)

However, I am still getting the following error during building:

In file included from /depot/cms/kernels/root632/include/ROOT/RConfig.hxx:23,
                 from /depot/cms/kernels/root632/include/RtypesCore.h:23,
                 from /depot/cms/kernels/root632/include/TError.h:34,
                 from /depot/cms/purdue-af/roofit-batchcompute/res/RooNaNPacker.h:18,
                 from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cxx:27,
                 from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cu:2:
/depot/cms/kernels/root632/include/RConfigure.h:30:4: warning: #warning "The C++ standard in this build does not match ROOT configuration (202002L); this might cause unexpected issues" [-Wcpp]
 #  warning "The C++ standard in this build does not match ROOT configuration (202002L); this might cause unexpected issues"
    ^~~~~~~
In file included from /depot/cms/kernels/root632/include/RtypesCore.h:23,
                 from /depot/cms/kernels/root632/include/TError.h:34,
                 from /depot/cms/purdue-af/roofit-batchcompute/res/RooNaNPacker.h:18,
                 from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cxx:27,
                 from /depot/cms/purdue-af/roofit-batchcompute/src/ComputeFunctions.cu:2:
/depot/cms/kernels/root632/include/ROOT/RConfig.hxx:48:2: error: #error "ROOT requires support for C++17 or higher."
 #error "ROOT requires support for C++17 or higher."
  ^~~~~
/depot/cms/kernels/root632/include/ROOT/RConfig.hxx:50:2: error: #error "Pass `-std=c++17` as compiler argument."
 #error "Pass `-std=c++17` as compiler argument."
  ^~~~~
make[2]: *** [CMakeFiles/RooBatchCompute_CUDA.dir/build.make:92: CMakeFiles/RooBatchCompute_CUDA.dir/src/ComputeFunctions.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/RooBatchCompute_CUDA.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

I tried some workarounds like setting CMAKE_CXX_STANDARD 17, but it didn’t help. I have gcc version 12.3.0, and the cmake command is as follows:

cmake -DCMAKE_MODULE_PATH=${PWD}/cmake/modules -Dcuda=ON ..

jonas · August 16, 2024, 8:19am

Hi, thanks for reporting back!

Maybe also setting CMAKE_CUDA_STANDARD will help?
https://cmake.org/cmake/help/latest/prop_tgt/CUDA_STANDARD.html

kondratyevd · August 19, 2024, 4:41pm

Than you, this helped indeed. For the record, I was able to build the library with the following lines in CMakeLists.txt:

set(CMAKE_CUDA_STANDARD 17)

set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE)
set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++ CACHE STRING "" FORCE)
set (CMAKE_CUDA_COMPILER "/usr/local/cuda-12.2/bin/nvcc" CACHE STRING "" FORCE)

And here is my full build script:

git init
git remote add origin https://github.com/root-project/root.git
git config core.sparseCheckout true
echo "/cmake/modules/*" >> .git/info/sparse-checkout
echo "roofit/batchcompute/*" >> .git/info/sparse-checkout
git pull origin master
mv roofit/batchcompute/* .
mv res/* src/
cp /path/to/CMakeLists.txt .

mkdir -p build
cd build
rm -rf *

if [[ ":$PATH:" != *":/usr/local/cuda-12.2/bin:"* ]]; then
    export PATH=/usr/local/cuda-12.2/bin:$PATH
fi
if [[ ":$LD_LIBRARY_PATH:" != *":/usr/local/cuda-12.2/lib64:"* ]]; then
    export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
fi

cmake -DCMAKE_MODULE_PATH=${PWD}/cmake/modules -Dcuda=ON ..

make