Dear Experts,
I am training a BDT with TMVA and would like to perform a hyperparameters optimisation. I would like to use HyperParameterOptimisation that is not yet documented.
For the training, I use ~1.2 mill signal and ~1 mill background events, therefore my initial aim was to optimise on 10% of those, since even using 1% is taking quite long already (so far it haven’t finished after more than a day with inclusion of additional optimised parameters as I will explain further). Just for now as I am setting things I stick to 1% in the examples below. The data is split into test and train samples 50/50.
What I did initially (after setting dataloader as usually):
TMVA::HyperParameterOptimisation * HPO = new TMVA::HyperParameterOptimisation(dataloader);
// like in a snippet I saw somewhere
HPO->BookMethod(TMVA::Types::kBDT, "BDT", "");
std::cout << "Info: calling TMVA::HyperParameterOptimisation::Evaluate" << std::endl;
HPO->Evaluate();
TMVA::HyperParameterOptimisationResult HPOResult = HPO->GetResults();
HPOResult.Print();
HPO->SaveAs(Form("HPO_%s.root",datasetDirName.data()));
std::cout << "GetROCAverage: " << HPOResult.GetROCAverage();
std::cout << "\nGetEff01Values:\n";
for (auto i = HPOResult.GetEff01Values().begin(); i != HPOResult.GetEff01Values().end(); ++i)
std::cout << ' ' << *i;
// ... same printouts for GetEff30Values, GetEffAreaValues, GetROCValues etc.
TFile *MyFile = new TFile(Form("HPOResult_%s.root", datasetDirName.data()),"RECREATE");
TMultiGraph * t = HPOResult.GetROCCurves();
t->Write();
MyFile->Close();
delete MyFile;
Which gives me:
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 1
: AdaBoostBeta 0.6
: MaxDepth 3
: MinNodeSize 15.5
: NTrees 207.533
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 2
: AdaBoostBeta 0.6
: MaxDepth 2.08005
: MinNodeSize 15.5
: NTrees 514.316
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 3
: AdaBoostBeta 0.596308
: MaxDepth 2.39675
: MinNodeSize 15.5
: NTrees 505
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 4
: AdaBoostBeta 0.6
: MaxDepth 3.82036
: MinNodeSize 15.5
: NTrees 529.937
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 5
: AdaBoostBeta 0.41429
: MaxDepth 2.75713
: MinNodeSize 15.5
: NTrees 505
Next step I added:
// like in my actual training
HPO->BookMethod("BDT", "BDTG", "H:V:NTrees=1000:BoostType=Grad:Shrinkage=0.20:UseBaggedBoost:BaggedSampleFraction=0.4:SeparationType=GiniIndex:nCuts=500:PruneMethod=NoPruning:MaxDepth=5");
HPO->SetNumFolds(3);
HPO->SetFitter("Minuit");
HPO->SetFOMType("Separation");
And redefined 2 functions to tune more parameters:
void TMVA::MethodBDT::SetTuneParameters(std::map<TString,Double_t> tuneParameters)
{
std::map<TString,Double_t>::iterator it;
for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
Log() << kWARNING << it->first << " = " << it->second << Endl;
if (it->first == "MaxDepth" ) SetMaxDepth ((Int_t)it->second);
else if (it->first == "nCuts" ) {this->fNCuts = (Int_t)it->second;}
else if (it->first == "MinNodeSize" ) SetMinNodeSize (it->second);
else if (it->first == "NTrees" ) SetNTrees ((Int_t)it->second);
else if (it->first == "NodePurityLimit") SetNodePurityLimit (it->second);
else if (it->first == "AdaBoostBeta" ) SetAdaBoostBeta (it->second);
else if (it->first == "Shrinkage" ) SetShrinkage (it->second);
else if (it->first == "UseNvars" ) SetUseNvars ((Int_t)it->second);
else if (it->first == "BaggedSampleFraction" ) SetBaggedSampleFraction (it->second);
else Log() << kFATAL << " SetParameter for " << it->first << " not yet implemented " <<Endl;
}
}
// see: https://root.cern.ch/root/html/src/TMVA__MethodBDT.cxx.html#v6inl%3E
std::map<TString,Double_t> TMVA::MethodBDT::OptimizeTuningParameters(TString fomType, TString fitType)
{
std::cout << "call the Optimzier with the set of paremeters and ranges that are meant to be tuned.\n";
// fill all the tuning parameters that should be optimized into a map:
std::map<TString,TMVA::Interval*> tuneParameters;
std::map<TString, Double_t> tunedParameters;
// note: the 3rd paraemter in the inteval is the "number of bins", NOT the stepsize !!
// the actual VALUES at (at least for the scan, guess also in GA) are always
// read from the middle of the bins. Hence.. the choice of Intervals e.g. for the
// MaxDepth, in order to make nice interger values!!!
// find some reasonable ranges for the optimisation of MinNodeEvents:
tuneParameters.insert(std::pair<TString,Interval*>("NTrees", new Interval(10,1000,5))); // stepsize 50
tuneParameters.insert(std::pair<TString,Interval*>("MaxDepth", new Interval(2,4,3))); // stepsize 1
tuneParameters.insert(std::pair<TString,Interval*>("MinNodeSize", new LogInterval(1,30,30))); //
tuneParameters.insert(std::pair<TString,Interval*>("NodePurityLimit",new Interval(.4,.6,3))); // stepsize .1
tuneParameters.insert(std::pair<TString,Interval*>("BaggedSampleFraction",new Interval(.4,.9,6))); // stepsize .1
tuneParameters.insert(std::pair<TString,Interval*>("nCuts", new Interval(20,700,10)));
// method-specific parameters
tuneParameters.insert(std::pair<TString,Interval*>("Shrinkage", new Interval(0.05,0.50,5)));
std::cout << " the following BDT parameters will be tuned on the respective *grid*\n";
std::map<TString,TMVA::Interval*>::iterator it;
for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++)
{
std::cout << it->first << " ";
(it->second)->Print(std::cout);
std::cout << "\n";
}
OptimizeConfigParameters optimize(this, tuneParameters, fomType, fitType);
tunedParameters = optimize.optimize();
return tunedParameters;
}
And respective output looked like:
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 1
: BaggedSampleFraction 0.65
: MaxDepth 3
: MinNodeSize 15.5002
: NTrees 505
: NodePurityLimit 0.5
: Shrinkage 0.275
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 2
: BaggedSampleFraction 0.65
: MaxDepth 2.99998
: MinNodeSize 15.5
: NTrees 393.523
: NodePurityLimit 0.5
: Shrinkage 0.275
<HEADER> HyperParameterOptimisa...: ===========================================================
: Optimisation for BDT fold 3
: BaggedSampleFraction 0.649979
: MaxDepth 3
: MinNodeSize 15.5
: NTrees 505
: NodePurityLimit 0.5
: Shrinkage 0.275
GetROCAverage: 0
GetEff01Values:
GetEff10Values:
GetEff30Values:
...
So, I am a bit puzzeled with understanding what I’ve got. Since I don’t actually use cross-validation is it even correct to use the HPO module for tuning? Why don’t I get any Get*Values printed, and the created root file has no TMultiGraph? If it’s expected, how do I compare the improvements wrt nominal options I use because I would like to see a proof that the proposed values are actually better (I don’t see the module saving the responses for each training, but only the parameters, see L.102 )? Should the average across folds be corresponding to the best parameters? Can I expect the best parameters found on 1%/10% statistics be also optimal on larger statistics? Are there any parameters that I should freeze to currently found values to make a faster optimisation on larger statistics for tricky parameters (and which would those be than)?
I would highly appreciate your input on any of those questions.
Best regards, Olena
References:
[1] HyperParameterOptimisation