Errors when using TMVA::CrossValidationResult for Deep Neural Networks

jgrundy · February 14, 2018, 2:29pm

Hello ROOTers,

I’m attempting to use cross-validation to train a Deep Neural Network but I’m running into a couple of problems. I’d be grateful for any insight.

Initially, I was attempting to use this piece of code…

TMVA::CrossValidation crossVal(dataLoader);
crossVal.BookMethod(TMVA::Types::kDNN, “DNN”, “Architecture=GPU:Layout=RELU|30,SIGMOID|20,LINEAR”);
crossVal.Evaluate();
TMVA::CrossValidationResult results = crossVal.GetResults();
results.Print();

…;however, I then get this error…

root [0] Processing DNN.C…
/home/jgrundy/DNN/./DNN.C:80:30: error: no viable conversion from ‘const std::vector’ to 'TMVA::CrossValidationResult’
TMVA::CrossValidationResult results = crossVal.GetResults();

This is surprising since I’m only booking a single method. However, accepting this to be true and treating GetResults like it produces a vector, I still get errors…

TMVA::CrossValidation crossVal(dataLoader);
crossVal.BookMethod(TMVA::Types::kDNN, "DNN”, “Architecture=GPU:Layout=RELU|30,SIGMOID|20,LINEAR”);
crossVal.Evaluate();
TMVA::CrossValidationResult results = crossVal.GetResults()[0];
results.Print();

… produces the error…

root [0] .x DNN.C
IncrementalExecutor::executeFunction: symbol ‘ZN4TMVA21CrossValidationResultC1ERKS0’ unresolved while linking function ‘_GLOBAL__sub_I_cling_module_8’!
You are probably missing the definition of TMVA::CrossValidationResult::CrossValidationResult(TMVA::CrossValidationResult const&)
Maybe you need to load the corresponding shared library?

I’m pretty sure I have all the libraries loaded and ready to use so I’m unsure what the issue here is. I’m new to using TMVA so apologies if the solution is obvious.

Cheers,
James

ps. I’m using ROOT 6.12/04.

kialbert · February 14, 2018, 3:02pm

Hi jgrundy,

Welcome to TMVA

I don’t see any obvious mistakes in the code. Could you try running the tutorial example at $ROOTSYS/tutorials/tmva/TMVACrossValidation.C and see if the same problem occurs there?

Also, did you compile ROOT yourself on your local machine, or are you running through lxplus, or something else?

Cheers,
Kim

jgrundy · February 14, 2018, 3:38pm

Hi Kim,

Thanks for the quick reply .

The tutorial example seems to work fine, which is interesting…

root [0]
Processing /usr/local/root060218/tutorials/tmva/TMVACrossValidation.C…
[TFile::Cp] Total 0.20 MB |====================| 100.00 % [1.9 MB/s]
Info in TFile::OpenFromCache: using local cache copy of http://root.cern.ch/files/tmva_class_example.root [./files/tmva_class_example.root]
DataSetInfo : [dataset] : Added class “Signal”
: Add Tree TreeS of type Signal with 6000 events
DataSetInfo : [dataset] : Added class “Background”
: Add Tree TreeB of type Background with 6000 events
: Dataset[dataset] : Class index : 0 name : Signal
: Dataset[dataset] : Class index : 1 name : Background
: Evaluate method: Fisher
: Evaluation done.] (94%, time left: 0 sec)
CrossValidation : ==== Results ====
: Fold 0 ROC-Int : 0.8913
: Fold 1 ROC-Int : 0.8973
: Fold 2 ROC-Int : 0.8954
: Fold 3 ROC-Int : 0.8880
: Fold 4 ROC-Int : 0.9026
: ------------------------
: Average ROC-Int : 0.8949
: Std-Dev ROC-Int : 0.0056
root [1]

Perhaps this implies I’ve made a mistake somewhere higher up in my code?

I’m not compiling ROOT myself, I’m using ‘source /usr/local/root060218/bin/thisroot.sh’.

Cheers,
James

kialbert · February 14, 2018, 4:32pm

Sorry, I can’t replicate the problem on my local machine. Could be a problem of your training script or the configuration of root you are using (which seems to be compiled on your machine).

Cheers,
Kim

jgrundy · February 14, 2018, 4:48pm

Here’s my whole root macro…

{
int no_of_vars = 53;

TFile* trainingFile = new TFile("/unix/atlas3/hh4b/EventAnaResults_170719/GB_Processed_EventAnaResults_resolved_BigCR_all.root");
TTree* sigTree = (TTree*)trainingFile->Get("SM_HH_Nominal_FullyTagged");
TTree* bkgTree = (TTree*)trainingFile->Get("data_FullyTagged");//data_... for four tag

double sigWeight  = 3000/24.3, bkgWeight  = 3000/24.3;

TMVA::DataLoader* dataLoader = new TMVA::DataLoader();
dataLoader->AddSignalTree(sigTree, sigWeight);
dataLoader->AddBackgroundTree(bkgTree, bkgWeight);
dataLoader->PrepareTrainingAndTestTree("", 0, 0, 0, 0, "SplitMode=Random");

string varNames[53] = { "MEt",  "scaledMEt",  "deltaPhi4jMin",  "meff",  "meffLeps",  "meffHC",  "mtbmin",  "hcand1_m",  "hcand1_pt",  "hcand1_dRjj",   "hcand1_jet1_pt", "hcand1_jet1_eta", "hcand1_jet1_trackMoverP",  "hcand1_jet2_pt",  "hcand1_jet2_eta",  "hcand1_jet2_trackMoverP",  "hcand2_m",  "hcand2_pt",  "hcand2_dRjj",  "hcand2_jet1_pt", "hcand2_jet1_eta",  "hcand2_jet1_trackMoverP",  "hcand2_jet2_pt",  "hcand2_jet2_eta",  "hcand2_jet2_trackMoverP",  "hcand1_hcand2_dEta",  "hcand1_hcand2_dR",  "hcand1_hcand2_m",  "hcand1_hcand2_pt",  "hcand1_hcand2_scaledM",  "Xt1",  "Dhh",  "Rhh",  "Xhh",  "cosThetaStar",  "cosTheta1",  "cosTheta2",  "Phi",  "Phi1",  "extraJet1_pt",  "extraJet1_eta",  "extraJet1_HH_dPhi",  "electron1_pt",  "electron1_mt",  "electron1_minDRj",  "extraMuon1_pt",  "extraMuon1_mt",  "extraMuon1_minDRj",  "GCdR_min",  "GCdR_max",  "GCdR_diff",  "GCdR_sum",  "avgAbsHCJetEta"};

for(int i = 0; i < no_of_vars; i++){
	dataLoader->AddVariable( varNames[i], 'F');
}

TMVA::CrossValidation crossVal(dataLoader);

crossVal.BookMethod(TMVA::Types::kDNN, "DNN", "Architecture=GPU:Layout=RELU|30,SIGMOID|20,LINEAR");
crossVal.Evaluate();
TMVA::CrossValidationResult results = crossVal.GetResults();
results.Print();

}

Thanks again,
James

jgrundy · February 14, 2018, 4:52pm

Oops! Mis-read your message and thought you’d asked to see the script!

Cool, thanks. I’ll give it a read over!

kialbert · February 14, 2018, 4:56pm

Do so! To ease debugging you could try adapting the tutorial into your usecase by first replacing the Fisher to a DNN e.g.

Also one thing I see is that you use "SplitMode=Random" for partitioning your data. Actually the CV splitting mechanism is separate from the conventional DataSet split so right now your not using half your data. This is being made more clear in future versions.

Replace this with “nTestSignal=0:nTestBackground=0”. That should force all events into the training set so that CV can pick them up.

Cheers,
Kim

jgrundy · February 15, 2018, 5:01pm

Hi,

Just to give a brief update on the problem, the solution was to include TMVA::Instance(); at the start of the macro and to print using…

auto results = cv.GetResults();
for (auto r : results){
r.Print();
}

…like in the example macro ($ROOTSYS/tutorials/tmva/TMVACrossValidation.C).

Kim, can I ask, how do you print the statistical significance? I’m having trouble using .GetSigValues as I don’t really understand what is going on with the for loop and .Print() call.

Cheers,
James

kialbert · February 15, 2018, 5:16pm

Hi,

Glad you figured it out!

I’m not sure what you mean with .GetSigValues, CrossValidationResults has no such method defined. Could you describe in more detail what it is you expect as output?

Cheers,
Kim

jgrundy · February 15, 2018, 5:47pm

Hi Kim,

Thanks for the quick reply :)!

Overall the metric I’m using to assess the performance of my DNN is statistical significance (signal/sqrt(background)). When using the TMVA::Factory instead of TMVA::CrossValidation, I can produce a Classifier Cut Efficiencies plot with TMVA::TMVAGui, from which I can get a value of statistical significance. I saw on https://root.cern.ch/doc/master/classTMVA_1_1CrossValidationResult.html that there is a method called GetSigValues() and wondered whether that would allow me to print off the statical significance instead of having to use the plot, is that correct?

Cheers,
James

kialbert · February 15, 2018, 10:38pm

It is possible to book several methods at once with CrossValidation, this is why cv.GetResults() returns a vector. The first method you book will correspond to the first results etc.

The print call prints a summary of what is available through the different methods of CrossValidationResults.

The call r.GetSigValues gives you a vector of the significances for each fold as obtained by calling GetSignificance on the trained method.

Hope that clear things up!

Cheers,
Kim

jgrundy · February 15, 2018, 11:34pm

Hi Kim,

Can I just ask how I call GetSignificance on the trained method? I can’t seem to find any tutorial on this. Thanks again.

Cheers,
James

kialbert · February 16, 2018, 11:10am

This you can do in non-cv by

auto m = factory->GetMethod();
m->GetSignificance();

This information is also printed in the textual output. In the cv case this is not currently possible since you can’t acces the methods for the folds. Instead you have to rely on the numbers provided to you through CrossValidationResults::GetSigValues().

Cheers,
Kim

jgrundy · February 16, 2018, 11:29am

Hi Kim,

Thanks again for your patience in replying, I’ll try to make this my last post! I’m not currently managing to output any significance values in the textual output, just ROC integrals. This is my code…

TMVA::CrossValidation crossVal(dataLoader);
TFile *outputFile = TFile::Open("DNN_crossVal.root", "RECREATE");
crossVal.SetFile(outputFile);
crossVal.BookMethod(TMVA::Types::kDNN, "DNN", "Architecture=GPU:Layout=SIGMOID|20,LINEAR");
crossVal.SetNumFolds(2);
crossVal.Evaluate();

auto results = crossVal.GetResults();
for (auto r : results){ //While true r = results
	r.GetSigValues();
	r.Print();
}

…why wouldn’t this work? I’ve also tried to write a .root file that I can use with TMVA::TMVAGui to study some of the output plots but this doesn’t seem to be working either.

Cheers again,
James

kialbert · February 16, 2018, 11:39am

Hi James,

No worries, I’m here to help!

You might want to have a quick look through the documentation of TMVA (found here) and a c++ tutorial.

r.GetSigValues() returns a vector of values, with the first element being the significance of the first fold. To print:

auto sigs = r.GetSignificance();
for (auto s : sigs) {
    std::cout << s << “, “ << std::endl;
}
std::cout << std::endl;

jgrundy · February 16, 2018, 12:19pm

Hi Kim,

I still can’t seem to get this working. I’ve tried two ways and neither work…

//1.
auto results = crossVal.GetResults();
for (auto r : results){ //While true r = results
	auto sigs = r.GetSigValues();
	for (auto s : sigs) {
		cout << "sig1... " << s << endl;
	}
	r.Print();
}

…this doesn’t print anything.

//2.
auto sigs = results.GetSigValues();
for (auto s : sigs) {
	cout << "sig2... " << s << endl;
}
std::cout << std::endl;

…this gives an error as apparently .GetSigValues() isn’t a member of results. Perhaps I’ve interpreted your suggestions incorrectly?

Cheers,
James

kialbert · February 19, 2018, 6:44pm

This seems to be a bug. I will look into it.

This also means it’s impossible to get the statistics for now unfortunately.

Cheers,
Kim