Signal peaking on the wrong side?

Hello,
I switched from ROOT 6.08 to 6.16 and I found a strange “feature” when training with BDT Gradient:
The signal accumulated on the negative side of the score distribution, instead of the positive one.
I checked that it is not a plotting issue, the ones in blue are really signal events and the one in red are background events.

Even if the actual score value is just a convention, this seems an unexpected behavior.
If you confirm so, I can fill a bug report.

Best,
Lucabdt

Do you have some simple reproducer we can test both with 6.08 and 6.16 ?

Hello,
I send you the original dataset, because I failed to produce for you a smaller test sample:
https://cernbox.cern.ch/index.php/s/2ZpW3V8o7aG8DYH

The factory book parameters I used for the BDT are:
“!H:!V:NTrees=1000:MaxDepth=2:MinNodeSize=2.5%:nCuts=20:BoostType=Grad:UseBaggedBoost=true:Shrinkage=0.1:BaggedSampleFraction=0.5”

Let m know if you need more information.
Luca.

Hi,

I cannot reproduce this problem with the TMVA example code (e.g. TMVAClassification.C). I would need your full macro to reproduce this

Lorenzo

Dear Lonrezo,

I was able to reproduce the problem with this modified version of TMVAClassification.C (attached)TMVAClassification.C (30.2 KB)

Can you please have a look?

Best Regards,
Luca.

Hi,
Can you please also attach or send a link to the input data ?
Thank you

Lorenzo

Hello Lorenzo,

the input data was already provided on Wed 24th July in this thread. See the message just above your first comment.

Hi,

Sorry for this. Yes, I can reproduce now the problem, it is probably due to the fact you are adding signal and background events individually using DataLoader::AddBackgroundTrainingEvent and DataLoader::AddSignalTrainingEvent

I will check those functions.
Thank you for reporting this and providing a way to reproduce this problem

Lorenzo

I noticed this is happening only for BDTG and not BDT.
Also it is dependent of the order of assigning first. If you are adding to the dataloader first the background event (or tree), you get this behaviour, if you are doing it with the signal you will get the correct plot.
This is due to the definition of the class index in TMVA::DataSetInfo.
Every method should be independent of that class index value, but thisis not the case for BDTG. This is then a bug and we will fix it.
As a workaround make sure that the signal event are added first.
For example in your macro addfirst the signal events and then the background ones:

   int ntrainEvts = trainTree->GetEntries();

   for (UInt_t i=0; i< ntrainEvts; i++) {
     trainTree->GetEntry(i);
     for (UInt_t ivar=0; ivar<11; ivar++) vars[ivar] = treevars[ivar];
     if (classID==1) dataloader->AddSignalTrainingEvent( vars, weight );
   }

 
   for (UInt_t i=0; i< ntrainEvts; i++) {
     trainTree->GetEntry(i);
     for (UInt_t ivar=0; ivar<11; ivar++) vars[ivar] = treevars[ivar];

     if (classID==0) dataloader->AddBackgroundTrainingEvent( vars, weight );
   }


Cheers

Lorenzo

Hi,

The bug has been fixed in the ROOT master.
Thank you for reporting it

Lorenzo

Dear Lorenzo,

thanks, I’m glad that this has been fixed.

Cheers,
Luca.