BDT Results Changing Depending on the Order They are Added to the dataloader

Hi all,

I am having problems with the new dataloader in TMVA. I have 4 files (signal training file, signal test file, background training file and background test file) which I want to use as inputs for a BDT MVA with N=200. The problem is that depending whether I add the signal trees or the background trees to the dataloader first, I get different results. I have tried using the PrepareTrainingAndTestTree function but it has made no difference.

Thanks in advance for any help!

My code looks like:

void TrainData(TString sigFilename, TString bckFilename, TString TupleName, TString *varsName, const int nvars) {

//Remove ".root" from filename for future use
  TString sigFileStem = sigFilename.Remove(sigFilename.Length() - 5, 5); 
  TString bckFileStem = bckFilename.Remove(bckFilename.Length() - 5, 5);

// Create ouput file, factory object and open the input file
  TFile* outputFile = TFile::Open( "TMVA_Results_BDT_N=200.root", "RECREATE" );
  TMVA::DataLoader* dataloader = new TMVA::DataLoader(".");
  TMVA::Factory* factory = new TMVA::Factory("tmvaTest", outputFile, "");
  TFile* sigTrainFile = new TFile(sigFileStem + "_Train.root");
  TFile* bckTrainFile = new TFile(bckFileStem + "_Train.root");
  TFile* sigTestFile = new TFile(sigFileStem + "_Test.root");
  TFile* bckTestFile = new TFile(bckFileStem + "_Test.root");

// Get the TTree objects from the input files
  TTree* sigTrain = (TTree*)sigTrainFile->Get(TupleName + "_Train");
  TTree* bckTrain = (TTree*)bckTrainFile->Get(TupleName + "_Train");
  TTree* sigTest = (TTree*)sigTestFile->Get(TupleName + "_Test");
  TTree* bckTest = (TTree*)bckTestFile->Get(TupleName + "_Test");

// Get the number of entries in each TTree
  int nSigTrain = sigTrain->GetEntries();
  int nBckTrain = bckTrain->GetEntries();
  int nSigTest = sigTest->GetEntries();
  int nBckTest = bckTest->GetEntries();

// Global event weights
  double sigWeight = 1.0;
  double bckWeight = 1.0;
  dataloader->AddBackgroundTree(bckTrain, bckWeight, TMVA::Types::kTraining);
  dataloader->AddSignalTree(sigTrain, sigWeight, TMVA::Types::kTraining);
  dataloader->AddBackgroundTree(bckTest, bckWeight, TMVA::Types::kTesting);
  dataloader->AddSignalTree(sigTest, sigWeight, TMVA::Types::kTesting);
  dataloader->PrepareTrainingAndTestTree("", "", "NormMode=None:!V");

  /*dataloader->AddSignalTree(sigTrain, sigWeight, TMVA::Types::kTraining);
  dataloader->AddBackgroundTree(bckTrain, bckWeight, TMVA::Types::kTraining);
  dataloader->AddSignalTree(sigTest, sigWeight, TMVA::Types::kTesting);
  dataloader->AddBackgroundTree(bckTest, bckWeight, TMVA::Types::kTesting);
  dataloader->PrepareTrainingAndTestTree("", "", "NormMode=None:!V");*/
   
// Define the input variables that shall be used for the MVA training
// (the variables used in the expression must exist in the original TTree).
  for ( int i=0 ; i<nvars ; i++ ) {
    dataloader->AddVariable("SelVars_Nominal_" + varsName[i], 'F');
  }

// Book MVA methods (see TMVA manual).  
  factory->BookMethod(dataloader,TMVA::Types::kBDT, "BDT", "NTrees=200:MaxDepth=4:MinNodeSize=5%:nCuts=100:BoostType=AdaBoost:AdaBoostBeta=0.15");  

// Train, test and evaluate all methods
  factory->TrainAllMethods();
  factory->TestAllMethods();
  factory->EvaluateAllMethods();    

// Save the output and finish up
  outputFile->Close();
  std::cout << "==> wrote root file TMVA.root" << std::endl;
  std::cout << "==> TMVAnalysis is done!" << std::endl; 

  delete factory;
  delete dataloader;

}

Hi,

In what way are the results different, and what do expect?

Is there a variation also when eunning the same training twice? (Using the same order)

If it is the case that the variation is minute this would be expected. The current dataloader is dependent on the order of input definition, either reading in the background or signal data first.

Cheers,
Kim

Hi Kim!

Like this:

BDT_N%3D200%20(Sig%201st) BDT_N%3D200%20(Bck%201st)

Thanks for any help you can give!

Hi,

It seems that maybe the training didn’t converge. Could you also post the output of the training?

Did you try running it for longer? What Did you try outputting the monitoring histograms?

Did you also try a different classifier, e.g. “BoostType=Grad”?

Cheers,
Kim

Hi,

The output of the training was:

DataSetInfo              : [.] : Added class "Background"
                         : Add Tree CxAODTuple_Nominal_Train of type Background with 63334 events
DataSetInfo              : [.] : Added class "Signal"
                         : Add Tree CxAODTuple_Nominal_Train of type Signal with 45321 events
                         : Add Tree CxAODTuple_Nominal_Test of type Background with 63334 events
                         : Add Tree CxAODTuple_Nominal_Test of type Signal with 45322 events
Factory                  : Booking method: BDT
                         : 
DataSetFactory           : [.] : Number of events in input trees
                         : 
                         : 
                         : Dataset[.] : No weight renormalisation applied: use original global and event weights
DataSetInfo              : Correlation matrix (Background):
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         :                                   SelVars_Nominal_mBB SelVars_Nominal_mMMC SelVars_Nominal_mHH SelVars_Nominal_ptB2 SelVars_Nominal_drBB SelVars_Nominal_ptLepTau SelVars_Nominal_drLepTau SelVars_Nominal_mtLepMet SelVars_Nominal_dPhiHH SelVars_Nominal_metPhiCentrality SelVars_Nominal_MET
                         :              SelVars_Nominal_mBB:              +1.000               +0.153              -0.418               +0.710               +0.676                   -0.002                   +0.167                   +0.082                 -0.000                           -0.064              +0.055
                         :             SelVars_Nominal_mMMC:              +0.153               +1.000              -0.239               +0.027               +0.095                   +0.118                   +0.595                   +0.392                 +0.002                           +0.153              +0.387
                         :              SelVars_Nominal_mHH:              -0.418               -0.239              +1.000               -0.281               -0.623                   +0.259                   -0.446                   -0.169                 +0.001                           +0.128              +0.043
                         :             SelVars_Nominal_ptB2:              +0.710               +0.027              -0.281               +1.000               +0.263                   -0.011                   +0.043                   +0.060                 -0.004                           -0.048              +0.020
                         :             SelVars_Nominal_drBB:              +0.676               +0.095              -0.623               +0.263               +1.000                   -0.108                   +0.224                   +0.063                 +0.002                           -0.172              -0.074
                         :         SelVars_Nominal_ptLepTau:              -0.002               +0.118              +0.259               -0.011               -0.108                   +1.000                   -0.432                   +0.148                 +0.003                           +0.030              +0.039
                         :         SelVars_Nominal_drLepTau:              +0.167               +0.595              -0.446               +0.043               +0.224                   -0.432                   +1.000                   +0.181                 -0.003                           -0.051              +0.044
                         :         SelVars_Nominal_mtLepMet:              +0.082               +0.392              -0.169               +0.060               +0.063                   +0.148                   +0.181                   +1.000                 +0.003                           -0.126              +0.350
                         :           SelVars_Nominal_dPhiHH:              -0.000               +0.002              +0.001               -0.004               +0.002                   +0.003                   -0.003                   +0.003                 +1.000                           +0.004              -0.000
                         : SelVars_Nominal_metPhiCentrality:              -0.064               +0.153              +0.128               -0.048               -0.172                   +0.030                   -0.051                   -0.126                 +0.004                           +1.000              +0.215
                         :              SelVars_Nominal_MET:              +0.055               +0.387              +0.043               +0.020               -0.074                   +0.039                   +0.044                   +0.350                 -0.000                           +0.215              +1.000
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataSetInfo              : Correlation matrix (Signal):
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         :                                   SelVars_Nominal_mBB SelVars_Nominal_mMMC SelVars_Nominal_mHH SelVars_Nominal_ptB2 SelVars_Nominal_drBB SelVars_Nominal_ptLepTau SelVars_Nominal_drLepTau SelVars_Nominal_mtLepMet SelVars_Nominal_dPhiHH SelVars_Nominal_metPhiCentrality SelVars_Nominal_MET
                         :              SelVars_Nominal_mBB:              +1.000               +0.024              -0.173               +0.628               +0.565                   +0.141                   -0.108                   +0.023                 +0.005                           +0.001              +0.156
                         :             SelVars_Nominal_mMMC:              +0.024               +1.000              -0.117               +0.015               +0.040                   +0.033                   +0.228                   +0.069                 +0.006                           +0.069              +0.078
                         :              SelVars_Nominal_mHH:              -0.173               -0.117              +1.000               +0.092               -0.548                   +0.488                   -0.568                   -0.056                 +0.004                           +0.078              +0.485
                         :             SelVars_Nominal_ptB2:              +0.628               +0.015              +0.092               +1.000               -0.039                   +0.189                   -0.224                   +0.002                 +0.003                           +0.034              +0.202
                         :             SelVars_Nominal_drBB:              +0.565               +0.040              -0.548               -0.039               +1.000                   -0.177                   +0.306                   +0.042                 +0.003                           -0.070              -0.149
                         :         SelVars_Nominal_ptLepTau:              +0.141               +0.033              +0.488               +0.189               -0.177                   +1.000                   -0.619                   +0.119                 +0.002                           -0.083              +0.117
                         :         SelVars_Nominal_drLepTau:              -0.108               +0.228              -0.568               -0.224               +0.306                   -0.619                   +1.000                   +0.057                 -0.001                           -0.164              -0.504
                         :         SelVars_Nominal_mtLepMet:              +0.023               +0.069              -0.056               +0.002               +0.042                   +0.119                   +0.057                   +1.000                 -0.004                           -0.343              -0.182
                         :           SelVars_Nominal_dPhiHH:              +0.005               +0.006              +0.004               +0.003               +0.003                   +0.002                   -0.001                   -0.004                 +1.000                           +0.002              +0.004
                         : SelVars_Nominal_metPhiCentrality:              +0.001               +0.069              +0.078               +0.034               -0.070                   -0.083                   -0.164                   -0.343                 +0.002                           +1.000              +0.268
                         :              SelVars_Nominal_MET:              +0.156               +0.078              +0.485               +0.202               -0.149                   +0.117                   -0.504                   -0.182                 +0.004                           +0.268              +1.000
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataSetFactory           : [.] :  
                         : 
Factory                  : Train all methods
Factory                  : [.] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'SelVars_Nominal_mBB' <---> Output : variable 'SelVars_Nominal_mBB'
                         : Input : variable 'SelVars_Nominal_mMMC' <---> Output : variable 'SelVars_Nominal_mMMC'
                         : Input : variable 'SelVars_Nominal_mHH' <---> Output : variable 'SelVars_Nominal_mHH'
                         : Input : variable 'SelVars_Nominal_ptB2' <---> Output : variable 'SelVars_Nominal_ptB2'
                         : Input : variable 'SelVars_Nominal_drBB' <---> Output : variable 'SelVars_Nominal_drBB'
                         : Input : variable 'SelVars_Nominal_ptLepTau' <---> Output : variable 'SelVars_Nominal_ptLepTau'
                         : Input : variable 'SelVars_Nominal_drLepTau' <---> Output : variable 'SelVars_Nominal_drLepTau'
                         : Input : variable 'SelVars_Nominal_mtLepMet' <---> Output : variable 'SelVars_Nominal_mtLepMet'
                         : Input : variable 'SelVars_Nominal_dPhiHH' <---> Output : variable 'SelVars_Nominal_dPhiHH'
                         : Input : variable 'SelVars_Nominal_metPhiCentrality' <---> Output : variable 'SelVars_Nominal_metPhiCentrality'
                         : Input : variable 'SelVars_Nominal_MET' <---> Output : variable 'SelVars_Nominal_MET'
TFHandler_Factory        :                         Variable                                Mean                                RMS                        [        Min                                Max ]
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         :              SelVars_Nominal_mBB:                        1.7793e+05                        1.7663e+05   [                            16175.                        5.3996e+06 ]
                         :             SelVars_Nominal_mMMC:                            156.54                            92.275   [                            60.001                            2504.5 ]
                         :              SelVars_Nominal_mHH:                        5.1376e+05                        3.2411e+05   [                        2.5002e+05                        4.4747e+06 ]
                         :             SelVars_Nominal_ptB2:                            69758.                            55300.   [                            20002.                        1.8231e+06 ]
                         :             SelVars_Nominal_drBB:                            1.9588                           0.98933   [                           0.37236                            5.7493 ]
                         :         SelVars_Nominal_ptLepTau:                        1.1274e+05                            93841.   [                            173.51                        1.3813e+06 ]
                         :         SelVars_Nominal_drLepTau:                            1.8719                           0.98342   [                           0.21928                            5.4472 ]
                         :         SelVars_Nominal_mtLepMet:                            56541.                            50031.   [                            2.1447                        8.5843e+05 ]
                         :           SelVars_Nominal_dPhiHH:                         0.0091489                            2.5901   [                           -3.1416                            3.1416 ]
                         : SelVars_Nominal_metPhiCentrality:                           0.48355                           0.98396   [                           -1.4142                            1.4142 ]
                         :              SelVars_Nominal_MET:                            97701.                            87859.   [                            265.03                        1.3062e+06 ]
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         : Ranking input variables (method unspecific)...
IdTransformation         : Ranking result (top variable is best ranked)
                         : ---------------------------------------------------------
                         : Rank : Variable                         : Separation
                         : ---------------------------------------------------------
                         :    1 : SelVars_Nominal_mHH              : 7.245e-01
                         :    2 : SelVars_Nominal_drLepTau         : 6.888e-01
                         :    3 : SelVars_Nominal_ptLepTau         : 4.739e-01
                         :    4 : SelVars_Nominal_mMMC             : 3.810e-01
                         :    5 : SelVars_Nominal_drBB             : 3.353e-01
                         :    6 : SelVars_Nominal_mtLepMet         : 3.044e-01
                         :    7 : SelVars_Nominal_MET              : 2.717e-01
                         :    8 : SelVars_Nominal_metPhiCentrality : 2.464e-01
                         :    9 : SelVars_Nominal_dPhiHH           : 2.225e-01
                         :   10 : SelVars_Nominal_mBB              : 1.994e-01
                         :   11 : SelVars_Nominal_ptB2             : 8.010e-02
                         : ---------------------------------------------------------
Factory                  : Train method: BDT for Classification
                         : 
BDT                      : #events: (reweighted) sig: 54327.5 bkg: 54327.5
                         : #events: (unweighted) sig: 45321 bkg: 63334
                         : Training 200 Decision Trees ... patience please
                         : Elapsed time for training with 108655 events: 11.9 sec         
BDT                      : [.] : Evaluation of BDT on training sample (108655 events)
                         : Elapsed time for evaluation of 108655 events: 1.44 sec       
                         : Creating xml weight file: ./weights/tmvaTest_BDT.weights.xml
                         : Creating standalone class: ./weights/tmvaTest_BDT.class.C
                         : TMVA_Results_BDT_N=200.root:/./Method_BDT/BDT
Factory                  : Training finished
                         : 
                         : Ranking input variables (method specific)...
BDT                      : Ranking result (top variable is best ranked)
                         : ------------------------------------------------------------------
                         : Rank : Variable                         : Variable Importance
                         : ------------------------------------------------------------------
                         :    1 : SelVars_Nominal_mHH              : 2.285e-01
                         :    2 : SelVars_Nominal_drLepTau         : 1.720e-01
                         :    3 : SelVars_Nominal_drBB             : 9.064e-02
                         :    4 : SelVars_Nominal_mMMC             : 8.764e-02
                         :    5 : SelVars_Nominal_mBB              : 8.613e-02
                         :    6 : SelVars_Nominal_ptLepTau         : 6.337e-02
                         :    7 : SelVars_Nominal_ptB2             : 6.208e-02
                         :    8 : SelVars_Nominal_MET              : 5.818e-02
                         :    9 : SelVars_Nominal_dPhiHH           : 5.357e-02
                         :   10 : SelVars_Nominal_mtLepMet         : 5.163e-02
                         :   11 : SelVars_Nominal_metPhiCentrality : 4.624e-02
                         : ------------------------------------------------------------------
Factory                  : === Destroy and recreate all methods via weight files for testing ===
                         : 
                         : Reading weight file: ./weights/tmvaTest_BDT.weights.xml
Factory                  : Test all methods
Factory                  : Test method: BDT for Classification performance
                         : 
BDT                      : [.] : Evaluation of BDT on testing sample (108656 events)
                         : Elapsed time for evaluation of 108656 events: 1.41 sec       
Factory                  : Evaluate all methods
Factory                  : Evaluate classifier: BDT
                         : 
BDT                      : [.] : Loop over test events and fill histograms with classifier response...
                         : 
TFHandler_BDT            :                         Variable                                Mean                                RMS                        [        Min                                Max ]
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         :              SelVars_Nominal_mBB:                        1.7880e+05                        1.7849e+05   [                            18640.                        5.2092e+06 ]
                         :             SelVars_Nominal_mMMC:                            155.97                            91.671   [                            60.004                            2446.9 ]
                         :              SelVars_Nominal_mHH:                        5.1482e+05                        3.2493e+05   [                        2.5000e+05                        3.9392e+06 ]
                         :             SelVars_Nominal_ptB2:                            69836.                            55723.   [                            20001.                        1.8494e+06 ]
                         :             SelVars_Nominal_drBB:                            1.9600                           0.98914   [                           0.38637                            5.6204 ]
                         :         SelVars_Nominal_ptLepTau:                        1.1328e+05                            94253.   [                            187.54                        1.3082e+06 ]
                         :         SelVars_Nominal_drLepTau:                            1.8666                           0.98301   [                           0.21083                            5.3692 ]
                         :         SelVars_Nominal_mtLepMet:                            56345.                            49552.   [                           0.22147                        6.8076e+05 ]
                         :           SelVars_Nominal_dPhiHH:                          0.023143                            2.5875   [                           -3.1416                            3.1416 ]
                         : SelVars_Nominal_metPhiCentrality:                           0.47881                           0.98423   [                           -1.4142                            1.4142 ]
                         :              SelVars_Nominal_MET:                            98259.                            88792.   [                            348.22                        1.2302e+06 ]
                         : -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         : 
                         : Evaluation results ranked by best signal efficiency and purity (area)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet       MVA                       
                         : Name:         Method:          ROC-integ
                         : .             BDT            : 0.992
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
                         : Testing efficiency compared to training efficiency (overtraining check)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet              MVA              Signal efficiency: from test sample (from training sample) 
                         : Name:                Method:          @B=0.01             @B=0.10            @B=0.30   
                         : -------------------------------------------------------------------------------------------------------------------
                         : .                    BDT            : 0.902 (0.902)       0.979 (0.979)      0.997 (0.997)
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
Dataset:.                : Created tree 'TestTree' with 108656 events
                         : 
Dataset:.                : Created tree 'TrainTree' with 108655 events
                         : 
Factory                  : Thank you for using TMVA!
                         : For citation information, please visit: http://tmva.sf.net/citeTMVA.html
==> wrote root file TMVA.root
==> TMVAnalysis is done!

I am not sure how to do output the monitoring histograms or run it for longer. When I run it with BoostType=Grad I get…

BDT_N%3D200%20(Bck%201st) BDT_N%3D200%20(Sig%201st)

(background first then signal first). I have also noticed that if I run the code without changing it between runs (and closing root in between runs), I also get different results

BDT_N%3D200%20(Bck%201st%20-%20TMVA%20-%20Run%201) BDT_N%3D200%20(Bck%201st%20-%20TMVA%20-%20Run%202) BDT_N%3D200%20(Bck%201st)

Cheers,

Lauren

Hi,

Thanks for the plots. This warrants further investigation I think. Let me get back to you

Cheers,
Kim