Change in TMVA::BDTG results

tomiwa · June 6, 2018, 12:58pm

Hello All,

Please does anyone know why different versions of Root gives different BDT results using same training configuration? I had been training with Root 6.06.02 and I recently moved to Root 6.12.02. I kept the same configuration for my training (the only change I made was wrapping my TMVA::Factory object into TMVA::DataLoader.

Please see attached results of the differnt version

Thanks
Kehinde

oshadura · June 6, 2018, 2:33pm

@kialbert please can you maybe help here?

kialbert · June 7, 2018, 2:13pm

Hi,

Between those two versions, nothing that I know of should change the output of the training significantly. However, a few parts pertaining to the GBDT’s were re-written partially in the two years since the two releases.

Would it be possible for you to post your training script, including data? Or post your training configuration, (Options strings for BDT, Factory, Dataloader and Dataloader::PrepareTrainAndTestSet).

Given this information, I can narrow down the changes and get back to you with a better answer

Cheers,
Kim

tomiwa · June 7, 2018, 2:31pm

Hi Kim,

thanks for the response, here is a snippet from the scripts

for version 6.06.02:

TMVA::Factory *factory = new TMVA::Factory( sampletype+bdtclass, outputFile,"!V:!Silent:Color:DrawProgressBar:Transformations=I;P;G,D:AnalysisType=Classification" );

std::vector mvaVars_V = splitstr(getVar(bdtclass)," “);
std::vector spectatorVars_V= splitstr(spectatorVars,” ");

std::cout<<“initializing variables…”<<std::endl;
for(int i = 0; i<mvaVars_V.size(); i++){
std::cout<<"adding: "<<mvaVars_V.at(i)<<std::endl;
factory->AddVariable(mvaVars_V.at(i),mvaVars_V.at(i), ‘F’);
}

for(int i =0; i<spectatorVars_V.size(); i++){
std::cout<<"adding spectator: "<<spectatorVars_V.at(i)<<std::endl;
factory->AddSpectator(spectatorVars_V.at(i),spectatorVars_V.at(i),‘F’);
}

Double_t signalWeight = 1.0;
Double_t backgroundWeight = 1.0;
factory->AddSignalTree ( signalTree, signalWeight);
factory->AddBackgroundTree(backgroundTree, backgroundWeight);
factory->SetSignalWeightExpression (“weight”);
factory->SetBackgroundWeightExpression(“weight”);
TCut mycuts = getcut(bdtclass);
TCut mycutb = getcut(bdtclass);

factory->PrepareTrainingAndTestTree(mycuts,mycutb, “nTrain_Signal=0:nTrain_Background=0:SplitMode=Random:NormMode=EqualNumEvents:!V”);
factory->BookMethod( TMVA::Types::kBDT, “BDTG1”,"!H:!V:NTrees=800:MinNodeSize=1:BoostType=Grad:Shrinkage=0.06:UseBaggedBoost:BaggedSampleFraction=0.6:nCuts=20:MaxDepth=3");
std::cout<<“Training all methods…”<<std::endl;
factory->TrainAllMethods();
std::cout<<“Testing all methods…”<<std::endl;
factory->TestAllMethods();
std::cout<<“Evaluate all methods…”<<std::endl;
factory->EvaluateAllMethods();
outputFile->Close();
std::cout<<“Finished the training run…”<<std::endl;
delete factory;

for version 6.12.02

TMVA::Factory *factory = new TMVA::Factory(mvaoutname, outputFile,"!V:!Silent:Color:DrawProgressBar:Transformations=I;P;G,D:AnalysisType=Classification" );
TMVA::DataLoader *dataloader=new TMVA::DataLoader(“dataset”);

std::vector mvaVars_V = splitstr(getVar(bdtclass)," “);
std::vector spectatorVars_V= splitstr(spectatorVars,” ");

std::cout<<“initializing variables…”<<std::endl;
for(int i = 0; i<mvaVars_V.size(); i++){
std::cout<<"adding: "<<mvaVars_V.at(i)<<std::endl;
dataloader->AddVariable(mvaVars_V.at(i),mvaVars_V.at(i), ‘F’);
}

for(int i =0; i<spectatorVars_V.size(); i++){
std::cout<<"adding spectator: "<<spectatorVars_V.at(i)<<std::endl;
dataloader->AddSpectator(spectatorVars_V.at(i),spectatorVars_V.at(i),‘F’);
}

Double_t signalWeight = 1.0;
Double_t backgroundWeight = 1.0;
dataloader->AddSignalTree ( signalTree, signalWeight);
dataloader->AddBackgroundTree(backgroundTree, backgroundWeight);
dataloader->SetSignalWeightExpression (“weight”);
dataloader->SetBackgroundWeightExpression(“weight”);
TCut mycuts = getcut(bdtclass);
TCut mycutb = getcut(bdtclass);

dataloader->PrepareTrainingAndTestTree(mycuts,mycutb, “nTrain_Signal=0:nTrain_Background=0:SplitMode=Random:NormMode=EqualNumEvents:!V”);
factory->BookMethod(dataloader, TMVA::Types::kBDT, “BDTG1”,"!H:!V:NTrees=800:MinNodeSize=1:BoostType=Grad:Shrinkage=0.06:UseBaggedBoost:BaggedSampleFraction=0.6:nCuts=20:MaxDepth=3");
std::cout<<“Training all methods…”<<std::endl;
factory->TrainAllMethods();
std::cout<<“Testing all methods…”<<std::endl;
factory->TestAllMethods();
std::cout<<“Evaluate all methods…”<<std::endl;
factory->EvaluateAllMethods();
outputFile->Close();
std::cout<<“Finished the training run…”<<std::endl;
delete dataloader;
delete factory;

Thanks
Kehinde

kialbert · June 7, 2018, 2:43pm

Hi,

What would be very helpful is some way to reproduce the behaviour (With the information you provided, I tried different approaches on the TMVAClassification dataset, but this simple example does not exhibit your behaviour).

If you cannot share the data, can you in some way describe the variables so I can set up a similar situation for reproducing the issue?

Edit: It would also be interesting to see the output on the training sample. (MVA_BDTG1_Train_S)

Cheers,
Kim

kialbert · June 8, 2018, 5:14pm

Hi,

So after Kehinde shared the files for training through email the problem was understood. The core issue was that older versions of TMVA had a bug that manifested itself only when there were very few training examples of one class (as is the case here).

The training procedure would weight events of the class few events disproportionately, hampering overall learning and leading to suboptimal results. (Once a training is complete, the results are correct but with the correction to the training algorithm, one can have better results).

(Thanks Kehinde for letting me post the resulting plots!)

TMVA v6.06

Notice the slight peak in the background distribution coinciding with the peak of the signal. This is usually an indication that something fishy is about. (Again, this is only a problem when one class has very few events.)

ROC AUC: ~0.800

ROOT-v6-15-wrong-calc

TMVA v6.12

With the corrected training algorithm in v6.12 a previously hidden problem of overtraining becomes visible; There is a marked difference between the two signal distributions. This is also consistent with the small sample size.

ROC AUC: ~0.820

ROOT-v6-15

TMVA v6.12 – Tuned parameters

Changing the parameters to prevent overfitting improves the distribution agreement (mainly by tuning the number of trees). Fluctuations are still present due to the small sample size. Further steps would be to better optimise the parameters, or use cross-validation for which there is limited support in v6.12 (and is much improved in v6.14!).

ROC AUC: ~0.820

ROOT-v6-15-new-param

Cheers,
Kim

JCZENG · September 9, 2019, 6:25pm

Hi Kim,

I noticed similar behavior of TMVA::BDTG between versions 6.10.04 and 6.14.04.
(Yes, one class has much less events.)
I am wondering when the BDTG update(partial re-write) was implemented.

The BDTG score distributions are here:
root-6.10.04

root-6.14.04

Cheers,
JC

kialbert · September 10, 2019, 6:08pm

Hi,

The change was implemented in ROOT v6.12.

In the off change you would be interested, here is the change responsible for this: https://github.com/root-project/root/pull/706/commits/a760aa2135850849e984b18d98152e92b66a6f4c

And for reference, the change as integrated into ROOT (as part of a larger PR): https://github.com/root-project/root/commit/a780e91b641526f170b8197abacbcfd9b69121f9#diff-9ecbd74e246c7a09fcf95c40d4f9db5d

Cheers,
Kim

JCZENG · September 10, 2019, 6:44pm

Thanks, this is very helpful.

Cheers,
JC