Installing ROOT with CUDA support in a Conda environment

Hi,

I work with users at Purdue CMS Analysis Facility (Purdue AF), who would like to be able to accelerate ROOT with GPUs (specifically, RooFit’s EvalBackend functionality). RooFit operations take dozens of hours in some cases, and users are excited about the possibility of accelerating that using available GPU resources.

At Purdue AF, we manage software stacks via Conda, so an ideal solution would be to keep it that way, and somehow enable CUDA for ROOT installation in a given Conda environment.

What is the best way to achieve this? Could you please provide specific instructions for installing ROOT in such a way, if it is possible?

OS: AlmaLinux 8
ROOT version: the latest will do (6.30.06)
CUDA version: 12.2

Thank you!

  • Dmitry

Hi Dmitry,

Thanks for reaching out. This sounds like a great use case. Actually, we’d also be very interested in understanding more of the great work you do at Purdue to understand how to best support you: that’s perhaps for a private message at a later stage with which we can reach out to you perhaps?

To come back to your question, let me ask two things for me to understand better the context. I see in the JL session the cms cvmfs mount point is available:

  • Have you checked with the curator of the content of that repository whether CMS distributes a CUDA enabled root as part of the CMSSW releases?
  • The SFT group distributes software stacks, e.g. for individual analysers, ATLAS, LHCb and SWAN. Among the type of releases, CUDA flavoured stacks are made available (e.g. /cvmfs/sft.cern.ch/lcg/views/LCG_105_cuda/). These look very much like CMSSW externals, i.e. a few hundred packages coherently compiled and distributed. Have you considered mounting sft.cern.ch on your AF?

Cheers and thanks again for the interesting post.

Danilo

1 Like

Hi Danilo,

Thank you very much for quick reply!

In general, we try to avoid using CVMFS distributions, as the setup scripts there tend to break existing environments by overwriting environment variables such as LD_LIBRARY_PATH. For this reason, most of our researchers are already used to not mixing CMSSW with Pythonic analysis workflows.

That said, I did try to use SFT builds, the only Alma8 build of latest ROOT that I found is this:
/cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.30.06/x86_64-almalinux8.9-gcc85-opt/bin/thisroot.sh
However, I cannot use pyROOT with it (I’m not sure if it is the problem of the build, or whether it interferes with ROOT that is already installed in our image):

[dkondra@purdue-af-1]$ source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.30.06/x86_64-almalinux8.9-gcc85-opt/bin/thisroot.sh
[dkondra@purdue-af-1]$ python3
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT as rt
cling (LLVM option parsing): for the --optimize-regalloc option: may only occur zero or one times!

What we are trying to achieve at Purdue AF is being able to use ROOT with CUDA within a given Conda environment (Jupyter kernel), such that the environment variables and dependencies in that kernel are respected. This has already been working well with ROOT installed from conda-forge, but there is no CUDA support there.

Please feel free to message me privately for further discussion on Mattermost or email.

Cheers,
Dmitry

Hi Dmitry,

Let’s move the discussion somewhere (I am contacting you now) and then we may decide to post the solution we find here.

Cheers,
D

Hi @kondratyevd! Very interesting project. I was thinking a little bit with @Danilo how one could use RooFit CUDA without rebuilding and repackaging ROOT.

Actually, nothing should be different with building ROOT with cuda=ON in the existing libraries. There is only an additional library that gets built, RooBatchCompute_CUDA:

I said nothing should be different instead of nothing is different because I realized I made a mistake in the RooFit likelihood evaluation code: in a premature optimization attempt, I only build some pure C++ code that is relevant for the CUDA evaluation if cuda=ON, even though it wouldn’t harm at all be build it always.

If you would make sure that except for the added shared library RooBatchCompute_CUDA.so, nothing needs to be changed, and I give you a simple recipe to build this library standalone, would that make things simpler for you? I would work on this for the 6.32.02 patch release then.

Cheers,
Jonas

Hi @Jonas,

Thank you very much for looking into this!

If it is just one library which I can build and link to specific ROOT installations in Conda environments, I think it should be enough. At the very least, I will be able to check it quickly and give feedback.

Looking forward to your recipe! Also, it would be great if you could mention which ROOT versions it should be compatible with. Thanks!

Cheers,
Dmitry

Hi @kondratyevd!

I have implemented now what I described, and the PR will be merged soon:

This change will be included in the next patch release 6.32.02, which will come out in 2 weeks. I think once the release is out, you can try out the instructions in the PR description!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Dear @kondratyevd ,

ROOT 6.32.02 is now available on the upstream conda channels. With the recipe provided by @jonas you should be able to build the RooFit batch compute library for GPU fitting against the conda installation of ROOT. Let me know if you try this out how it goes and we can followup.

Cheers,
Vincenzo