CUDA application for matrix multiplication

Dear colleagues,
I’m looking for fast matrix multiplication technique and usage of GPU capabilities with TCuda class looks very promising.
But I cannot find any example how to use this class in ROOT interpreter. When I trying to call TMVA::DNN::TCuda t, I get error: unknown type name ‘TCuda’.
It would be great to see a minimal example that allows one to use this class (or any other) for “low-level” access to GPU from ROOT.

Hi,

Thanks for the question and apologies for the slow reaction: this is a holiday period.
We do not have such examples yet, but in general you have a point: this is the reason why we are investing in CUDA, for example making it available to ROOT, also in interpreted mode ( see this draft PR, for example).
For the specific TMVA question, let me add in the loop @moneta , our expert.

Cheers,
D