I have read that ROOT has the functionality to parallelise GradBoost BDT training in TMVA via using the ROOT::EnableImplicitMT().
I am trying to do this on LXPLUS (also tried it on HTCondor with setting RequestCpus along) with ROOT (I tried 6.20.06-x86_64-centos7-gcc8-opt and 6.22.06-x86_64-centos7-gcc8-opt) by adding at the top of my code the ROOT::EnableImplicitMT(4), to request 4 threads. Based on some other posts I read here, I also added ROOT::GetImplicitMTPoolSize() command to see if the command is actually received, which gives a positive result. The program runs without errors.
However, despite large file size (several millions of training events with around 25 branches each) I don’t see any notable difference in execution times. According to performance studies in this paper, BDT training with 4-cores should perform about twice faster than single-core case.
I failed to find a TMVA BDT training (not NN) example that demonstrates how this functionality works. All replies point to a simple inclusion of the command ROOT::EnableImplicitMT(n) into the code. Could you kindly provide a minimal setup that shows this functionality applied to TMVA BDT training?