Question regarding multiclass training

Dear experts,

I am using a multi-class BDT in my analysis, and I wanted to ask a few questions regarding it.

I understand that TMVA is training the multi-class BDTs in the one-vs-rest mode.

Does TMVA normalize internally the number of training events in each class, just like in binary classification? I supply the dataloader PrepareTrainingAndTestTree options with NormMode=NumEvents.

My naïve understanding of one-vs-rest classification is as follows: Say we have 5 classes. TMVA will in turn consider each class to be ‘positive’ and clump the remaining 4 as a ‘negative’ class. This will lead to 5 responses, one for each class vs the rest. My concern is regarding the clumping of the negative classes. How does TMVA know in what relative proportions to add them together?

This question also comes to my mind when I see TMVA training output like the one below:

                         : 1-vs-rest performance metrics per class
                         : -------------------------------------------------------------------------------------------------------
                         : Considers the listed class as signal and the other classes
                         : as background, reporting the resulting binary performance.
                         : A score of 0.820 (0.850) means 0.820 was acheived on the
                         : test set and 0.850 on the training set.
                         : Dataset        MVA Method     ROC AUC        Sig eff@B=0.01 Sig eff@B=0.10 Sig eff@B=0.30
                         : Name:          / Class:       test  (train)  test  (train)  test  (train)  test  (train)
                         : DvsTau         BDTG_fold1
                         : ------------------------------
                         :                Signal         0.938 (0.942)  0.379 (0.425)  0.811 (0.822)  0.955 (0.960)
                         :                Bd_Bkg         0.795 (0.855)  0.076 (0.203)  0.451 (0.561)  0.715 (0.834)
                         :                Bu_Bkg         0.899 (0.901)  0.344 (0.360)  0.713 (0.709)  0.897 (0.900)
                         :                Bs_Bkg         0.822 (0.898)  0.201 (0.261)  0.554 (0.675)  0.741 (0.912)
                         :                Lb_Bkg         0.828 (0.880)  0.105 (0.228)  0.487 (0.640)  0.789 (0.879)
                         :                Sideband_RS    0.781 (0.787)  0.063 (0.069)  0.353 (0.366)  0.717 (0.727)
                         :                WS_Data        0.817 (0.815)  0.125 (0.127)  0.455 (0.452)  0.768 (0.765)

When TMVA tells me on row 1 that my Signal efficiency (for the class named as Signal) is 0.379 (0.425), at a background efficiency of 1%, how is it combining the background classes to get the 1% background efficiency?

I hope my question makes sense. My bottom line concern is that I don’t want TMVA to take the relative yields in the background training samples as something reflecting the real world, since many of those samples come from MC.

Thank you,

We currently have every single TMVA expert on vacation or distracted :-/ @moneta could you comment when you’re back next week?

The background efficiency considered in the table above is the one considering all backgrounds as a single class and one computes the 1 vs the rest efficiencies and ROC curves.
If you are using NumEvents as NormMode there is no scaling applied to the different types of events, all are considered the same, with the provided weights.
So there is no magic in treating the events. If you want them to consider as in the read data, you need to be sure you apply the correct weight factors to the different type of MC background events.