Problem building TMVA with CUDA

My build of root-6.22.02 with cuda fails at the stage of loading the cuda library objects needed for tmva. A working root.exe is produced, but I am missing the libtmva library. I have installed the cuda 11.0 toolkit and 8.3.3 cudnn libraries from the fedora31 repo and nvidia. Using gcc9.3.1. I would appreciate any help…

my LD_LIBRARY_PATH
/home/olin/packages/root/lib:/home/olin/packages/pythia6:/home/olin/packages/SPR-3.3.2/lib:/home/olin/alphaSoftware/alphaAnalysis/lib:/home/olin/alphaSoftware/alphavmc/lib/tgt_linuxx8664gcc:/home/olin/alphaSoftware/alphavmc/geant3/lib/tgt_linuxx8664gcc:/home/olin/alphaSoftware/alphavmc/geant3/tgt_linuxx8664gcc/TGeant3:/usr/local/cudnn/cuda/lib64:/usr/local/cuda/lib64
my cmake configure:
cmake -DCMAKE_INSTALL_PREFIX=$HOME/packages/root-6.22.02_install -DCMAKE_CXX_STANDARD=11 -DCMAKE_CUDA_STANDARD=11 -Dminuit2=ON -Dvmc=ON -Dcuda=ON -Dcudnn=OFF -Dtmva-gpu=ON -Dtmva-cpu=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_CUDA_RUNTIME_LIBRARY=/usr/local/cuda/lib64 $HOME/packages/root-6.22.02_src

(same failure with cudnn=ON)
my cmake build:
cmake --build . --target install – -j6

/usr/local/cuda/lib64/libcudart.so contains the missing references

Failure messages-partial
[ 98%] Building CXX object graf3d/eve/CMakeFiles/Eve.dir/src/TEveUtil.cxx.o
/usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function TMVA::DNN::TBatchNormLayer<TMVA::DNN::TCuda<float> >::Initialize()': MethodDL.cxx:(.text._ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv[_ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv]+0x85): undefined reference to cudaMemcpy’
/usr/bin/ld: MethodDL.cxx:(.text._ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv[_ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv]+0xb7): undefined reference to cudaMemcpy' /usr/bin/ld: MethodDL.cxx:(.text._ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv[_ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv]+0xed): undefined reference to cudaMemcpy’
/usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function TMVA::DNN::RNN::TBasicGRULayer<TMVA::DNN::TCuda<float> >::Forward(TMVA::DNN::TCudaTensor<float>&, bool)': MethodDL.cxx:(.text._ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE7ForwardERNS0_11TCudaTensorIfEEb[_ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE7ForwardERNS0_11TCudaTensorIfEEb]+0x76f): undefined reference to cudaMemcpy’
/usr/bin/ld: MethodDL.cxx:(.text._ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE7ForwardERNS0_11TCudaTensorIfEEb[_ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE7ForwardERNS0_11TCudaTensorIfEEb]+0x7ca): undefined reference to cudaMemcpy' /usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function TMVA::DNN::RNN::TBasicRNNLayer<TMVA::DNN::TCuda >::Backward(TMVA::DNN::TCudaTensor&, TMVA::DNN::TCudaTensor const&)’:
MethodDL.cxx:(.text.ZN4TMVA3DNN3RNN14TBasicRNNLayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7[ZN4TMVA3DNN3RNN14TBasicRNNLayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7]+0x1ba): undefined reference to cudaMemset' /usr/bin/ld: MethodDL.cxx:(.text._ZN4TMVA3DNN3RNN14TBasicRNNLayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7_[_ZN4TMVA3DNN3RNN14TBasicRNNLayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7_]+0x1e3): undefined reference to cudaMemset’
/usr/bin/ld: MethodDL.cxx:(.text.ZN4TMVA3DNN3RNN14TBasicRNNLayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7[ZN4TMVA3DNN3RNN14TBasicRNNLayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7]+0x20c): undefined reference to cudaMemset' /usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function TMVA::DNN::RNN::TBasicGRULayer<TMVA::DNN::TCuda >::Backward(TMVA::DNN::TCudaTensor&, TMVA::DNN::TCudaTensor const&)’:
MethodDL.cxx:(.text.ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7[ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7]+0x1b2): undefined reference to cudaMemset' /usr/bin/ld: MethodDL.cxx:(.text._ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7_[_ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7_]+0x1db): undefined reference to cudaMemset’
/usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o:MethodDL.cxx:(.text.ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7[ZN4TMVA3DNN3RNN14TBasicGRULayerINS0_5TCudaIfEEE8BackwardERNS0_11TCudaTensorIfEERKS7]+0x204): more undefined references to cudaMemset' follow /usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function TMVA::DNN::VGeneralLayer<TMVA::DNN::TCuda >::WriteMatrixToXML(void*, char const*, TMVA::DNN::TCudaMatrix const&)’:
MethodDL.cxx:(.text._ZN4TMVA3DNN13VGeneralLayerINS0_5TCudaIfEEE16WriteMatrixToXMLEPvPKcRKNS0_11TCudaMatrixIfEE[_ZN4TMVA3DNN13VGeneralLayerINS0_5TCudaIfEEE16WriteMatrixToXMLEPvPKcRKNS0_11TCudaMatrixIfEE]+0x1d5): undefined reference to cudaMemcpy' /usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function std::vector<double, std::allocator > TMVA::MethodDL::PredictDeepNet<TMVA::DNN::TCuda >(long long, long long, unsigned long, bool)’:
MethodDL.cxx:(.text._ZN4TMVA8MethodDL14PredictDeepNetINS_3DNN5TCudaIfEEEESt6vectorIdSaIdEExxmb[_ZN4TMVA8MethodDL14PredictDeepNetINS_3DNN5TCudaIfEEEESt6vectorIdSaIdEExxmb]+0xb4e): undefined reference to cudaMemcpy' /usr/bin/ld: CMakeFiles/TMVA.dir/src/DNN/Architectures/Cuda.cu.o: in function __device_stub__ZN4TMVA3DNN4Cuda10AddRowWiseIfEEvPT_PKS3_ii(float*, float const*, int, int)’:

@oshadura or @moneta Can you help?

Hi,

It looks to me a linking problem finding the Cuda library. Are you sure your Cuda installation is 100% functional ?
I would try removing these in cmake :

cmamke should find automatically the cuda location if it is in the PATH.
Also maybe the link log statement when linking TMVA would be useful to see (it can be obtained with make VERBOSE=1

Lorenzo

Thanks Moneta,
I have not tested my CUDA installation, so it is a likely source of the problem. In particular the choice of the cudnn version and its compatibility with cuda 11.0 is a guess on my part.
The CMakeCache.txt file seems to indicate that the correct paths have been found. Below is its CUDA-related content. Is the UNINITIALIZED tag significant?
I would like to try building a tmva with tmva-gpu but without the cudnn portion, which is supposed to be optional. Can you suggest appropriate flags for that?

CMAKE_CUDA_COMPILER:UNINITIALIZED=/usr/local/cuda/bin/nvcc

CMAKE_CUDA_COMPILER-CACHED:STRING=/usr/local/cuda/bin/nvcc

//Flags used by the CUDA compiler during all build types.
CMAKE_CUDA_FLAGS:STRING=

//Flags used by the CUDA compiler during DEBUG builds.
CMAKE_CUDA_FLAGS_DEBUG:STRING=-g

//Flags used by the CUDA compiler during MINSIZEREL builds.
CMAKE_CUDA_FLAGS_MINSIZEREL:STRING=-O1 -DNDEBUG

//Flags used by the CUDA compiler during RELEASE builds.
CMAKE_CUDA_FLAGS_RELEASE:STRING=-O2 -DNDEBUG

//Flags used by the CUDA compiler during RELWITHDEBINFO builds.
CMAKE_CUDA_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG

//No help, variable specified on the command line.
CMAKE_CUDA_RUNTIME_LIBRARY:UNINITIALIZED=/usr/local/cuda/lib64

CMAKE_CUDA_RUNTIME_LIBRARY-CACHED:STRING=/usr/local/cuda/lib64

//No help, variable specified on the command line.
CMAKE_CUDA_STANDARD:UNINITIALIZED=11

CMAKE_CUDA_STANDARD-CACHED:STRING=11

Hi Lorenzo,
I tried the VERBOSE=1 option. The link command is impressively long, but does contain a link to /usr/local/cuda/lib64, so why does in not resolve the references???
There is no mention of /usr/local/cuda/lib64/libcudart.so however in the links.

[ 93%] Linking CXX shared library …/…/lib/libTMVA.so
cd /home/olin/packages/root-6.22.02_bld1/tmva/tmva && /usr/bin/cmake -E cmake_link_script CMakeFiles/TMVA.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC -std=c++11 -Wno-implicit-fallthrough -Wno-noexcept-type -pipe -Wshadow -Wall -W -Woverloaded-virtual -fsigned-char -pthread -O2 -DNDEBUG -Wl,–no-undefined -Wl,–hash-style=“both” -shared -Wl,-soname,libTMVA.so -o …/…/lib/libTMVA.so CMakeFiles/TMVA.dir/src/BDTEventWrapper.cxx.o CMakeFiles/TMVA.dir/src/BinarySearchTree.cxx.o CMakeFiles/TMVA.dir/src/BinarySearchTreeNode.cxx.o CMakeFiles/TMVA.dir/src/BinaryTree.cxx.o CMakeFiles/TMVA.dir/src/CCPruner.cxx.o CMakeFiles/TMVA.dir/src/CCTreeWrapper.cxx.o CMakeFiles/TMVA.dir/src/Classification.cxx.o CMakeFiles/TMVA.dir/src/ClassifierFactory.cxx.o CMakeFiles/TMVA.dir/src/ClassInfo.cxx.o CMakeFiles/TMVA.dir/src/Config.cxx.o CMakeFiles/TMVA.dir/src/Configurable.cxx.o CMakeFiles/TMVA.dir/src/ConvergenceTest.cxx.o CMakeFiles/TMVA.dir/src/CostComplexityPruneTool.cxx.o CMakeFiles/TMVA.dir/src/CrossEntropy.cxx.o CMakeFiles/TMVA.dir/src/CrossValidation.cxx.o CMakeFiles/TMVA.dir/src/CvSplit.cxx.o CMakeFiles/TMVA.dir/src/DataInputHandler.cxx.o CMakeFiles/TMVA.dir/src/DataLoader.cxx.o CMakeFiles/TMVA.dir/src/DataSet.cxx.o CMakeFiles/TMVA.dir/src/DataSetFactory.cxx.o CMakeFiles/TMVA.dir/src/DataSetInfo.cxx.o CMakeFiles/TMVA.dir/src/DataSetManager.cxx.o CMakeFiles/TMVA.dir/src/DecisionTree.cxx.o CMakeFiles/TMVA.dir/src/DecisionTreeNode.cxx.o CMakeFiles/TMVA.dir/src/Envelope.cxx.o CMakeFiles/TMVA.dir/src/Event.cxx.o CMakeFiles/TMVA.dir/src/ExpectedErrorPruneTool.cxx.o CMakeFiles/TMVA.dir/src/Factory.cxx.o CMakeFiles/TMVA.dir/src/FitterBase.cxx.o CMakeFiles/TMVA.dir/src/GeneticAlgorithm.cxx.o CMakeFiles/TMVA.dir/src/GeneticFitter.cxx.o CMakeFiles/TMVA.dir/src/GeneticGenes.cxx.o CMakeFiles/TMVA.dir/src/GeneticPopulation.cxx.o CMakeFiles/TMVA.dir/src/GeneticRange.cxx.o CMakeFiles/TMVA.dir/src/GiniIndex.cxx.o CMakeFiles/TMVA.dir/src/GiniIndexWithLaplace.cxx.o CMakeFiles/TMVA.dir/src/HyperParameterOptimisation.cxx.o CMakeFiles/TMVA.dir/src/IFitterTarget.cxx.o CMakeFiles/TMVA.dir/src/IMethod.cxx.o CMakeFiles/TMVA.dir/src/Interval.cxx.o CMakeFiles/TMVA.dir/src/KDEKernel.cxx.o CMakeFiles/TMVA.dir/src/LDA.cxx.o CMakeFiles/TMVA.dir/src/LogInterval.cxx.o CMakeFiles/TMVA.dir/src/LossFunction.cxx.o CMakeFiles/TMVA.dir/src/MCFitter.cxx.o CMakeFiles/TMVA.dir/src/MethodANNBase.cxx.o CMakeFiles/TMVA.dir/src/MethodBase.cxx.o CMakeFiles/TMVA.dir/src/MethodBayesClassifier.cxx.o CMakeFiles/TMVA.dir/src/MethodBDT.cxx.o CMakeFiles/TMVA.dir/src/MethodBoost.cxx.o CMakeFiles/TMVA.dir/src/MethodCategory.cxx.o CMakeFiles/TMVA.dir/src/MethodCFMlpANN.cxx.o CMakeFiles/TMVA.dir/src/MethodCFMlpANN_Utils.cxx.o CMakeFiles/TMVA.dir/src/MethodCompositeBase.cxx.o CMakeFiles/TMVA.dir/src/MethodCrossValidation.cxx.o CMakeFiles/TMVA.dir/src/MethodCuts.cxx.o CMakeFiles/TMVA.dir/src/MethodDL.cxx.o CMakeFiles/TMVA.dir/src/MethodDNN.cxx.o CMakeFiles/TMVA.dir/src/MethodDT.cxx.o CMakeFiles/TMVA.dir/src/MethodFDA.cxx.o CMakeFiles/TMVA.dir/src/MethodFisher.cxx.o CMakeFiles/TMVA.dir/src/MethodHMatrix.cxx.o CMakeFiles/TMVA.dir/src/MethodKNN.cxx.o CMakeFiles/TMVA.dir/src/MethodLD.cxx.o CMakeFiles/TMVA.dir/src/MethodLikelihood.cxx.o CMakeFiles/TMVA.dir/src/MethodMLP.cxx.o CMakeFiles/TMVA.dir/src/MethodPDEFoam.cxx.o CMakeFiles/TMVA.dir/src/MethodPDERS.cxx.o CMakeFiles/TMVA.dir/src/MethodPlugins.cxx.o CMakeFiles/TMVA.dir/src/MethodRuleFit.cxx.o CMakeFiles/TMVA.dir/src/MethodSVM.cxx.o CMakeFiles/TMVA.dir/src/MethodTMlpANN.cxx.o CMakeFiles/TMVA.dir/src/MinuitFitter.cxx.o CMakeFiles/TMVA.dir/src/MinuitWrapper.cxx.o CMakeFiles/TMVA.dir/src/MisClassificationError.cxx.o CMakeFiles/TMVA.dir/src/ModulekNN.cxx.o CMakeFiles/TMVA.dir/src/MsgLogger.cxx.o CMakeFiles/TMVA.dir/src/NeuralNet.cxx.o CMakeFiles/TMVA.dir/src/Node.cxx.o CMakeFiles/TMVA.dir/src/OptimizeConfigParameters.cxx.o CMakeFiles/TMVA.dir/src/Option.cxx.o CMakeFiles/TMVA.dir/src/OptionMap.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamCell.cxx.o CMakeFiles/TMVA.dir/src/PDEFoam.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamDecisionTree.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamDecisionTreeDensity.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamDensityBase.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamDiscriminant.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamDiscriminantDensity.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamEvent.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamEventDensity.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamKernelBase.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamKernelGauss.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamKernelLinN.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamKernelTrivial.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamMultiTarget.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamTarget.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamTargetDensity.cxx.o CMakeFiles/TMVA.dir/src/PDEFoamVect.cxx.o CMakeFiles/TMVA.dir/src/PDF.cxx.o CMakeFiles/TMVA.dir/src/QuickMVAProbEstimator.cxx.o CMakeFiles/TMVA.dir/src/Ranking.cxx.o CMakeFiles/TMVA.dir/src/Reader.cxx.o CMakeFiles/TMVA.dir/src/RegressionVariance.cxx.o CMakeFiles/TMVA.dir/src/ResultsClassification.cxx.o CMakeFiles/TMVA.dir/src/Results.cxx.o CMakeFiles/TMVA.dir/src/ResultsMulticlass.cxx.o CMakeFiles/TMVA.dir/src/ResultsRegression.cxx.o CMakeFiles/TMVA.dir/src/ROCCalc.cxx.o CMakeFiles/TMVA.dir/src/ROCCurve.cxx.o CMakeFiles/TMVA.dir/src/RootFinder.cxx.o CMakeFiles/TMVA.dir/src/RuleCut.cxx.o CMakeFiles/TMVA.dir/src/Rule.cxx.o CMakeFiles/TMVA.dir/src/RuleEnsemble.cxx.o CMakeFiles/TMVA.dir/src/RuleFitAPI.cxx.o CMakeFiles/TMVA.dir/src/RuleFit.cxx.o CMakeFiles/TMVA.dir/src/RuleFitParams.cxx.o CMakeFiles/TMVA.dir/src/SdivSqrtSplusB.cxx.o CMakeFiles/TMVA.dir/src/SeparationBase.cxx.o CMakeFiles/TMVA.dir/src/SimulatedAnnealing.cxx.o CMakeFiles/TMVA.dir/src/SimulatedAnnealingFitter.cxx.o CMakeFiles/TMVA.dir/src/SVEvent.cxx.o CMakeFiles/TMVA.dir/src/SVKernelFunction.cxx.o CMakeFiles/TMVA.dir/src/SVKernelMatrix.cxx.o CMakeFiles/TMVA.dir/src/SVWorkingSet.cxx.o CMakeFiles/TMVA.dir/src/TActivationChooser.cxx.o CMakeFiles/TMVA.dir/src/TActivation.cxx.o CMakeFiles/TMVA.dir/src/TActivationIdentity.cxx.o CMakeFiles/TMVA.dir/src/TActivationRadial.cxx.o CMakeFiles/TMVA.dir/src/TActivationReLU.cxx.o CMakeFiles/TMVA.dir/src/TActivationSigmoid.cxx.o CMakeFiles/TMVA.dir/src/TActivationTanh.cxx.o CMakeFiles/TMVA.dir/src/Timer.cxx.o CMakeFiles/TMVA.dir/src/TNeuron.cxx.o CMakeFiles/TMVA.dir/src/TNeuronInputAbs.cxx.o CMakeFiles/TMVA.dir/src/TNeuronInputChooser.cxx.o CMakeFiles/TMVA.dir/src/TNeuronInput.cxx.o CMakeFiles/TMVA.dir/src/TNeuronInputSqSum.cxx.o CMakeFiles/TMVA.dir/src/TNeuronInputSum.cxx.o CMakeFiles/TMVA.dir/src/Tools.cxx.o CMakeFiles/TMVA.dir/src/TrainingHistory.cxx.o CMakeFiles/TMVA.dir/src/TransformationHandler.cxx.o CMakeFiles/TMVA.dir/src/TSpline1.cxx.o CMakeFiles/TMVA.dir/src/TSpline2.cxx.o CMakeFiles/TMVA.dir/src/TSynapse.cxx.o CMakeFiles/TMVA.dir/src/Types.cxx.o CMakeFiles/TMVA.dir/src/VariableDecorrTransform.cxx.o CMakeFiles/TMVA.dir/src/VariableGaussTransform.cxx.o CMakeFiles/TMVA.dir/src/VariableIdentityTransform.cxx.o CMakeFiles/TMVA.dir/src/VariableImportance.cxx.o CMakeFiles/TMVA.dir/src/VariableInfo.cxx.o CMakeFiles/TMVA.dir/src/VariableNormalizeTransform.cxx.o CMakeFiles/TMVA.dir/src/VariablePCATransform.cxx.o CMakeFiles/TMVA.dir/src/VariableRearrangeTransform.cxx.o CMakeFiles/TMVA.dir/src/VariableTransformBase.cxx.o CMakeFiles/TMVA.dir/src/VariableTransform.cxx.o CMakeFiles/TMVA.dir/src/VarTransformHandler.cxx.o CMakeFiles/TMVA.dir/src/Volume.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Reference.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Reference/DataLoader.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Reference/TensorDataLoader.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cpu.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cpu/CpuBuffer.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cpu/CpuMatrix.cxx.o CMakeFiles/TMVA.dir/src/RBDT.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cuda.cu.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cuda/CudaBuffers.cxx.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cuda/CudaMatrix.cu.o CMakeFiles/TMVA.dir/src/DNN/Architectures/Cuda/CudaTensor.cu.o CMakeFiles/G__TMVA.dir/G__TMVA.cxx.o -L/home/olin/packages/root-6.22.02_bld1/lib -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/local/cuda/targets/x86_64-linux/lib -Wl,-rpath,/home/olin/packages/root-6.22.02_bld1/lib:/usr/local/cuda/lib64: …/…/lib/libMinuit.so …/…/lib/libMLP.so …/…/lib/libXMLIO.so …/…/lib/libROOTDataFrame.so …/…/lib/libROOTVecOps.so -ltbb /usr/lib64/libopenblas.so /usr/local/cuda/lib64/libcublas.so …/…/lib/libTreePlayer.so …/…/lib/libGraf3d.so …/…/lib/libGpad.so …/…/lib/libGraf.so …/…/lib/libMultiProc.so …/…/lib/libTree.so …/…/lib/libNet.so …/…/lib/libHist.so …/…/lib/libMatrix.so …/…/lib/libMathCore.so …/…/lib/libRIO.so …/…/lib/libImt.so …/…/lib/libThread.so …/…/lib/libCore.so -lpthread -lrt -lpthread -ldl
/usr/bin/ld: CMakeFiles/TMVA.dir/src/MethodDL.cxx.o: in function TMVA::DNN::TBatchNormLayer<TMVA::DNN::TCuda<float> >::Initialize()': MethodDL.cxx:(.text._ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv[_ZN4TMVA3DNN15TBatchNormLayerINS0_5TCudaIfEEE10InitializeEv]+0x85): undefined reference to cudaMemcpy’

Hi,

I don’t see any coda libraries in the link statement. There should be two CMAKE variables,
CUDA_CUBLAS_LIBRARIES and CUDNN_LIBRARIES (if using cudnn) which contain the cuda libraries.
Maybe there is an issue with the FindCuda with Cmake which defines these variables. Are you getting any particular log message at the beginning in the configuration stage ?
Which cmake version are you using ?

Lorenzo

Hi Lorenzo,
I’m using cmake version 3.17.4. Apparently not the problem.
Your suggestion omitting
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_CUDA_RUNTIME_LIBRARY=/usr/local/cuda/lib64
worked. Specifically I put /usr/local/cuda/bin in PATH, /usr/local/cuda/lib64 in LD_LIBRARY_PATH
cmake -DCMAKE_INSTALL_PREFIX=$HOME/packages/root-6.22.02_install -DCMAKE_CXX_STANDARD=11 -DCMAKE_CUDA_STANDARD=11 -Dminuit2=ON -Dvmc=ON -Dcuda=ON -Dcudnn=ON -Dtmva-gpu=ON -Dtmva-cpu=ON $HOME/packages/root-6.22.02_src

When I make it with VERBOSE=1 I find -lcudart_static at the end of the link.
This was not the case with the additional options -DCMAKE_CUDA_RUNTIME_LIBRARY=/usr/local/cuda/lib64/libcudart.so
or
-DCMAKE_CUDA_RUNTIME_LIBRARY=/usr/local/cuda/lib64
Thank you so much for your help with this!

I don’t understand why, but it is good that it worked ! :smiley:

Hi Lorenzo,
While the root build with CUDA Version 11.0.228 and CUDNN 8.0.3 succeeds, there remain problems with actually running the code. DNN_CPU runs, but DNN_GPU fails. I expect this is the problem reported elsewhere with root and cuda11. There seem to be updated versions of cuda11 that I hoped to test, but fedora is giving problems installing them due to invalid dependency requirements.

Building with -Dcuda=ON -Dcudnn=ON and running stressTMVA from the test directory

Info in <TFile::OpenFromCache>: using local cache copy of http://root.cern.ch/files/tmva_class_example.root [./files/tmva_class_example.root]
DEEP NEURAL NETWORK:   Depth = 4  Input = ( 1, 1, 4 )  Batch size = 30  Loss function = C
	Layer 0	 DENSE Layer: 	 ( Input =     4 , Width =    64 ) 	Output = (  1 ,    30 ,    64 ) 	 Activation Function = Identity
	Layer 1	 DENSE Layer: 	 ( Input =    64 , Width =    64 ) 	Output = (  1 ,    30 ,    64 ) 	 Activation Function = Identity
	Layer 2	 DENSE Layer: 	 ( Input =    64 , Width =    64 ) 	Output = (  1 ,    30 ,    64 ) 	 Activation Function = Identity
	Layer 3	 DENSE Layer: 	 ( Input =    64 , Width =     1 ) 	Output = (  1 ,    30 ,     1 ) 	 Activation Function = Identity
TH1.Print Name  = TrainingHistory_DNN CPU_trainingError, Entries= 0, Total sum= inf
TH1.Print Name  = TrainingHistory_DNN CPU_valError, Entries= 0, Total sum= inf
                         : 
                         : Evaluation results ranked by best signal efficiency and purity (area)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet       MVA                       
                         : Name:         Method:          ROC-integ
                         : dataset       DNN CPU        : 0.921
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
                         : Testing efficiency compared to training efficiency (overtraining check)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet              MVA              Signal efficiency: from test sample (from training sample) 
                         : Name:                Method:          @B=0.01             @B=0.10            @B=0.30   
                         : -------------------------------------------------------------------------------------------------------------------
                         : dataset              DNN CPU        : 0.364 (0.426)       0.780 (0.755)      0.927 (0.920)
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
DNN CPU [4/4]....................................................OK
Info in <TFile::OpenFromCache>: using local cache copy of http://root.cern.ch/files/tmva_class_example.root [./files/tmva_class_example.root]
TCudaTensor::create cudnn handle ! 

Fro a build with -Dcudann=OFF

Info in TFile::OpenFromCache: using local cache copy of http://root.cern.ch/files/tmva_class_example.root [./files/tmva_class_example.root]
DEEP NEURAL NETWORK: Depth = 4 Input = ( 1, 1, 4 ) Batch size = 30 Loss function = C
Layer 0 DENSE Layer: ( Input = 4 , Width = 64 ) Output = ( 1 , 30 , 64 ) Activation Function = Identity
Layer 1 DENSE Layer: ( Input = 64 , Width = 64 ) Output = ( 1 , 30 , 64 ) Activation Function = Identity
Layer 2 DENSE Layer: ( Input = 64 , Width = 64 ) Output = ( 1 , 30 , 64 ) Activation Function = Identity
Layer 3 DENSE Layer: ( Input = 64 , Width = 1 ) Output = ( 1 , 30 , 1 ) Activation Function = Identity
TH1.Print Name = TrainingHistory_DNN CPU_trainingError, Entries= 0, Total sum= inf
TH1.Print Name = TrainingHistory_DNN CPU_valError, Entries= 0, Total sum= inf
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: dataset DNN CPU : 0.922
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset DNN CPU : 0.352 (0.445) 0.782 (0.760) 0.930 (0.917)
: -------------------------------------------------------------------------------------------------------------------
:
DNN CPU [4/4]…OK
Info in TFile::OpenFromCache: using local cache copy of http://root.cern.ch/files/tmva_class_example.root [./files/tmva_class_example.root]
CUDA Error: no CUDA-capable device is detected /home/olin/packages/root-6.22.02_src/tmva/tmva/src/DNN/Architectures/Cuda/CudaMatrix.cu 107
olin@lenolin:~/packages/root-6.22.02_bld1/test

Hi

This looks to be a problem with accessing the GPU not with building the code. I think it is a driver problem, I would check if you have the driver version correctly installed.
If you have the coorect driver, sometimes rebooting the machine helps.
I would check the GPU running for example
nvidia-smi

Lorenzo

The graphics driver was indeed the problem, and with that fixed dnn_gpu using the cudnn libraries passes the stressTMVA test. It appears that cuda11 and root are now compatible. Thanks for the helpful suggestions!