How do MethodCategory work?

a.bragagnolo · November 13, 2017, 3:42pm

Dear TMVA expert,
I’m trying to use MethodCategory with a BDT method. I get very weird and absurd results that make me curious on how categories work in TMVA.

My categorization is really simple: in about 5% of my events some variable are not available so I made two categories:

Category 1 for 95% of the events with all the 10 variables
Category 2 for 5% of the events with 6 variables

In order to control what is going on I train in the same macro a BDT with no categories (the empy variables are filled with -1 for that 5% of events). In principle the perfomances should be really similar since only very few events are not in category 1, and the events in category 2 should be quite difficult to separate.

This is my code (where I edited the variable names for simplicity):

factory->BookMethod( dataloader, TMVA::Types::kBDT, “BDT”, bdtOptions ); //control BDT

TString Cat1Vars(“var1:var2:var3:var4:var5:var6:var7:var8:var9:var10”);
TString Cat2Vars(“var1:var2:var3:var4:var5:var6”);

TMVA::MethodCategory* mcat = 0;
TMVA::MethodBase* BDT_Cat = factory->BookMethod( dataloader, TMVA::Types::kCategory, “BDT_Cat”,“” );
mcat = dynamic_castTMVA::MethodCategory*(BDT_OSMuon_Cat);
mcat->AddMethod( cat1Cut, Cat1Vars, TMVA::Types::kBDT, “BDT_Cat1”, bdtOptions );
mcat->AddMethod( !cat1Cut, Cat2Vars, TMVA::Types::kBDT, “BDT_Cat2”, bdtOptions );

Where

TString bdtOptions(“H:V:UseBaggedBoost:BaggedSampleFraction=0.8:NTrees=500:MaxDepth=X:nCuts=-1:MinNodeSize=0.1%:BoostType=RealAdaBoost:AdaBoostBeta=0.6”);

Strange things happen when I start to tune MaxDepth. If I use a “standard” value like MaxDepth=4 everything seems normal (see figure), and the categorized BDT perform slightly better (0.645 vs 0.639 ROC AUC)

When I start to push MaxDepth to higher value the non-categorized BDT start to (as expected) heavily overfit, while the categorized BDT keeps getting better and better (see figures). This is absurd since they should have very similar performance.

MaxDepth=6

MaxDepth=20

The ROC response is even weirder (considering the fact that TMVA give mi an outstanding 0.820 as roc integral (black curve)

MaxDepth=20
Selection_058

I really cannot understand what is going on with the categorized BDT. Basically the more I overfit the BDT the more the same BDT gets better in a simple characterization. The dataloader is the same and the SplitSeed is set to 0.

Do you have any idea why this happens? Where is the trick for this unrealistic performance of MethodCategory?

Thank you,

Alberto

EDIT:
to further test this behavior to the extreme I change the categorization into two random categories of the same size with no physical meaning (basically 50% chance to go into cat 1 and 50% canche to go into cat 2). In principle this should bring no increase in performance at all. But this is what I get:

Selection_060

The simple fact of being categorized (even without any sense) seems to make the BDT immune to overtraining!

kialbert · November 22, 2017, 12:03pm

Hi,

Sorry for the late reply! This is indeed weird behaviour and something we should look into.

Would it be possible for you to provide a minimal example still showing the problem? Something with some events with 2 variables and some events with 1 variable maybe. And appended all files necessary to run the example. That would help us quite a bit

Cheers,
Kim

a.bragagnolo · November 23, 2017, 5:47pm

Hi Kim,
even a late reply is always welcome. Here is a minimal example.
Before passing on the example I have tested the methods (with and without category) on a independent validation sample and this is what I got

aside from the background normalization it is quite clear that the performance of the categorized method are completely different from the performance shown in he TMVA test sample. My guess is that the categorized BDT somehow use all the events for training and go under heavy over-training.

Moving on the minimal example with 2 variable. But as I said in the EDIT the number of variables do not influence the outcome, even a random splitting of the training sample into two (random) categories exhibits the same behavior.

Here is a training macro
trainingOSMuon_test.C (3.3 KB)

this is a dataset with events (sorry for the big size)
https://www.dropbox.com/s/26xe5t6dar3fsqx/skim_BToJPsiKMu.root?dl=0

where you can run with

root -l -b -q 'trainingOSMuon_test.C("skim_BToJPsiKMu.root")'

This is the output I get

DataSet MVA
Name: Method: ROC-integ
dataset: BDT_OSMuon_Cat : 0.683
dataset: BDT_OSMuon : 0.564

Best Regards,
Alberto

kialbert · November 24, 2017, 1:18pm

Thanks for the very thorough investigation, this will surely help us

Seems like you are onto something with the training set observation, someone will look into it as soon as possible!

Cheers,
Kim

kialbert · November 29, 2017, 6:36pm

A bug report for this issue has been filed here. In summary, this is not expected behaviour, thanks for reporting!

Cheers,
Kim

a.bragagnolo · November 29, 2017, 7:06pm

Hi Kim,
the fact that MethodCathegory goes under heavy overtrain is supported by the fact that:
-if I train the method and than use it (in a real analysis) in the same sample the analysis perform way beyond any realistic level of performance
-if I train the method in a subset of the sample and then use it in a indipendent subset it performs way worse than a well trained bdt

To give you some real word numbers, in my analysis MVA methods should lead to a 30% increase of the efficiency. The normal bdt leads to a 28% increase, the categorized bdt leads to a over 100% increase, while the categorized bdt trained in a different subset a measly 20%. I would be really happy if that 100% was true, but sadly it is not.

Since the method cannot use some magic to enhance the efficiency of the analysis it is quite clear to me that it is trained with nearly all the events and so performs really well on them but really bad on others.

I hope this helps.

kialbert · November 30, 2017, 1:53pm

Your discussion is very useful indeed!

I set up a situation similar the one you mentioned in your second post. I used two categories, one with two simple but different gaussians and the other category with two identical gaussians.

I then trained three mva’s,

one using all data wrapped in methodCategory,
one using all data without categories and,
one using the two categories wrapped in a method Category.

With this setup it is impossible to have a better with categories since there is no discriminative power in the second category. As expected (1) and (2) have the same classifier performance. Somehow (3) manages to improve the score significantly.

I concur with your conclusions (and thanks for performing that validation with independent data!). Now, to figure out where things go wrong in the code

Cheers,
Kim

kialbert · November 30, 2017, 4:16pm

The problem lies in how MethodCategory currently does its data splitting for the categories.

TMVA divides the data into a training and a test set. MethodCategory does the same, but uses the original data for its own subdivision. There can thus, in general, be overlaps of the TMVA test set and the MethodCategory training set.

During training MethodCategory uses the per-category training data. However, during the testing it uses the global test set. If there is an overlap of these, inaccurate estimation of the generalisation follows.

This is definitely something we should fix. In the meanwhile it is possible to work around the issue if we ensure that the per-category training set and the global test set is non-overlapping.

This is possible by manually assigning train and test data which bypasses the internal splitting mechanism of the dataloader (and that of MethodCategory).

TMVA::DataLoader d {"dataset"};

d.AddSignalTree    ( sigTreeTraining, 1.0, TMVA::Types::kTraining );
d.AddBackgroundTree( bkgTreeTraining, 1.0, TMVA::Types::kTraining );

d.AddSignalTree    ( sigTreeTesting, 1.0, TMVA::Types::kTesting );
d.AddBackgroundTree( bkgTreeTesting, 1.0, TMVA::Types::kTesting );

Thanks for you help. I hope this is also helpful to you.

Cheers,
Kim

EDIT 2017-12-01: Changed workaround suggestion.
EDIT 2017-12-04: Updated code example for clarity.

a.bragagnolo · December 4, 2017, 2:37pm

Hi Kim,
in you workaround is d the dataloader?

Cheers,
Alberto

kialbert · December 4, 2017, 3:01pm

Yes indeed! I’ll modify the example so it’s more clear.

Cheers,
Kim

a.bragagnolo · December 4, 2017, 3:05pm

Ok, I understand how the workaround works. But in this way I would be forced to manually split the sample in a testing and a training tree. Am I wrong?

Cheers;

Alberto

kialbert · December 4, 2017, 3:50pm

No you are not. If we rely on the automatic splitting of trees there is almost always overlap so we have to do it manually for now.

Cheers,
Kim

a.bragagnolo · September 17, 2018, 2:22pm

Dear All,
I would like to bring to attention that this bug it’s still present in 6.12/07

Cheers,
Alberto

kialbert · September 17, 2018, 2:27pm

Hi Alberto,

Yes, it still remains. The bug report is still unsolved as there were complications discovered when resolving it. (Details in the report).

I will take a third look whenever I have the time to spare.

Cheers,
Kim

a.bragagnolo · September 17, 2018, 4:40pm

Hi Kim,
thank you for the answer.
I asked because the workaround you suggest almost a year ago it’s not working for me. TMVA trains without any problems the normal method but fails to train the Method Category. In fact it breaks the training.

The relevant, and quite cryptic, output is here:

Factory                  : Train method: BDTOsMuon2017 for Classification
                         : 
BDTOsMuon2017            : #events: (reweighted) sig: 9563.5 bkg: 9563.5
                         : #events: (unweighted) sig: 13542 bkg: 5585
                         : Training 600 Decision Trees ... patience please
                         : Elapsed time for training with 19127 events: 96.5 sec         
BDTOsMuon2017            : [dataset] : Evaluation of BDTOsMuon2017 on training sample (19127 events)
                         : Elapsed time for evaluation of 19127 events: 14 sec       
                         : Creating xml weight file: dataset/weights/TMVAClassification_BDTOsMuon2017.weights.xml
                         : Creating standalone class: dataset/weights/TMVAClassification_BDTOsMuon2017.class.C
                         : TMVAOsMuon2017.root:/dataset/Method_BDTOsMuon2017/BDTOsMuon2017
Factory                  : Training finished
                         : 
Factory                  : Train method: BDTOsMuon2017Category for Classification
                         : 
                         : Train all sub-classifiers for Classification ...
                         : Train method: BDTOsMuon2017Jet for Classification
BDTOsMuon2017Jet         : #events: (reweighted) sig: 5779.5 bkg: 5779.5
                         : #events: (unweighted) sig: 8119 bkg: 3440
                         : Training 600 Decision Trees ... patience please
                         : Zero events in purity calculation , return purity=0.5
<WARNING>                : stopped boosting at itree=8
                         : Elapsed time for training with 11559 events: 0.638 sec         
BDTOsMuon2017Jet         : [BDTOsMuon2017Jet_dsi] : Evaluation of BDTOsMuon2017Jet on training sample (11559 events)
                         : Elapsed time for evaluation of 11559 events: 0.0305 sec       
                         : TMVAOsMuon2017.root:/dataset/Method_BDTOsMuon2017Category/BDTOsMuon2017Category/Method_BDT/BDTOsMuon2017Jet
                         : Training finished
                         : Train method: BDTOsMuon2017noJet for Classification
BDTOsMuon2017noJet       : #events: (reweighted) sig: 3783.5 bkg: 3783.5
                         : #events: (unweighted) sig: 5422 bkg: 2145
                         : Training 600 Decision Trees ... patience please
                         : < ***  .........] (3%, time left: 25 sec)  
                         :  d: 0NCoef: 0 ivar: -1 cut: 0 cType: 1 s: -nan b: -nan nEv: -1 suw: 1567 buw: 751 nEvuw: 0 sepI: -1 sepG: -1 nType: -99
                         : My address is 653702096,  **** > 
                         : Zero events in purity calculation , return purity=0.5
<WARNING>                : stopped boosting at itree=21
                         : Elapsed time for training with 7567 events: 0.976 sec

The full log is attached here
log.txt (76.2 KB)

My code is:
test.py (3.0 KB)

Cheers,
Alberto

kialbert · September 17, 2018, 7:40pm

Thanks,

I will try to take a look this week.

Cheers,
Kim