K-fold CV in TMVA::Experimental::RReader

anfeng · June 25, 2024, 7:52am

Hi,

I am using TMVA::Experimental::RReader with k-fold cross validation and datasets split by deterministic SplitExpr to apply MVA prediction using RDataFrame. However, I am getting wrong MVA responses, which are very close to -1.

The code snippet I am using is like:

using namespace TMVA::Experimental;
ROOT::EnableImplicitMT(48);

RReader model("/path/to/TMVAdataset/weights/TMVAClassification_BDTG.weights.xml");

auto training_variables = model.GetVariableNames();
auto spectator_variables = model.GetSpectatorNames();
std::vector<std::string> variables(training_variables.size() + spectator_variables.size());
std::merge(training_variables.begin(), training_variables.end(), spectator_variables.begin(), spectator_variables.end(), variables.begin());

ROOT::RDataFrame rdf("DecayTree", "/path/to/input.root");
auto rdf2 = rdf.Define("BDTG_response", Compute<21, float>(model), variables);
rdf2.Snapshot("DecayTree", "/path/to/output.root");

and the response in the output file is like

+-----+-----------------+
| Row | BDTG_B_response |
+-----+-----------------+
| 0   | -0.999969       |
+-----+-----------------+
| 1   | -0.999946       |
+-----+-----------------+
| 2   | -0.999880       |
+-----+-----------------+
| 3   | -0.999982       |
+-----+-----------------+
| 4   | -0.999988       |
+-----+-----------------+
| 5   | -0.999983       |
+-----+-----------------+
| 6   | -0.999998       |
+-----+-----------------+
| 7   | -0.999995       |
+-----+-----------------+
| 8   | -0.999997       |
+-----+-----------------+
| 9   | -0.999997       |
+-----+-----------------+

So does TMVA::Experimental::RReader support k-fold CV with SplitExpr for now and what is the right way to use it?

The ROOT version I am using is 6.32.00.

Danilo · June 26, 2024, 4:37am

Hi,

Thanks for the post.
It’s not fully clear what the Compute<21,float> methods is from the example you post. Maybe you can give some details? I add in the loop @moneta for the specific question at the end of the post.

best,
D

anfeng · June 26, 2024, 6:44am

Hi Danilo,

Thanks for your reply. The Compute<21,float> is from tutorials/tmva/tmva003_RReader.C (I think it is TMVA::Experimental::Compute), where the 21 is the number of training_variables + the number of spectator_variables.

moneta · June 28, 2024, 3:00pm

Hi,
The RReader class is just to perform the evaluation of a trained model, reading the TMVA XML file.
I think it should work when using CV, although I am not sure how much this is tested. Otherwise you can try to use directly the Reader class as shown in this tutorial
https://root.cern.ch/doc/master/TMVACrossValidationApplication_8C.html

Lorenzo