Thanks for the info!
Sorry I fail to reproduce the problem then
I just tried also on lxplus-gpu
from CERN (where ROOT 6.32.02 is pre-installed). My instructions worked out of the box, modulo adjusting come compiler paths:
# Adapt to your system
set (CMAKE_CUDA_ARCHITECTURES "native" CACHE STRING "" FORCE)
set (CMAKE_CUDA_HOST_COMPILER /usr/bin/g++-13 CACHE STRING "" FORCE)
set (CMAKE_CUDA_COMPILER "/opt/cuda/bin/nvcc" CACHE STRING "" FORCE)
find_package(ROOT REQUIRED)
include(${ROOT_USE_FILE})
cmake_minimum_required(VERSION 3.14)
project(batchcompute-cuda LANGUAGES CUDA)
# in the src directory, put all files from roofit/batchcompute/src and roofit/batchcompute/res
add_library(RooBatchCompute_CUDA SHARED src/RooBatchCompute.cu src/ComputeFunctions.cu src/CudaInterface.cu)
target_include_directories(RooBatchCompute_CUDA PRIVATE src res)
target_compile_options(RooBatchCompute_CUDA PRIVATE -lineinfo --expt-relaxed-constexpr)
And then running the Python script from @green-cabbage works just fine.
So it’s unfortunately hard to make a diagnosis from my point. Do you have the possibility to do the same with a debug build of ROOT instead of the conda build, so we can properly debug this?