How to plot ROC curve for a specific BDT response value?

swanski82 · January 22, 2020, 4:37pm

Hi everyone,

I have a naive question. I am trying to evaluate the performance of BDT. I define a “working point” as choosing a BDT response value within [-1, 1] and want to determine the effect of this chosen cut by ROC curve.

Any idea would be appreciated. Thanks.

couet · January 22, 2020, 4:50pm

May be @moneta can help you.

kialbert · January 27, 2020, 4:23pm

Hi,

To figure out where on the roc-curve a particular cuts ends up you can do the following (just a sketch):

TTreeReader reader (a_tmva_tree);

TTreeReaderArray<char> reader_array (reader, "className");
TTreeReaderValue<float> val (reader, "BDTG");
TTreeReaderValue<int> val (reader, "ClassID");

int nentries = a_tmva_tree.GetEntries();
int signal_class = 0;

reader->SetEntry(0);
// WARN: This is hacky
if (className[0] == 'S') { // Signal
    signal_class = *class_id;
} elseif className[0] == 'B' { // Background
    signal_class = 1-*class_id;
} else {
    throw("ERROR!");
}

int num_true_positive = 0;
int num_false_positives = 0;

for (int i = 0; i < nentries; ++i) {
    reader->SetEntry(i);
    if (val >= YOUR_CUT and class_id == signal_class) {
        n_true_positives += 1;
    } else {
        n_false_positives += 1;
}

// From nentires, n_true_positives, and n_false_positives
// you can now calculate the ROC value for your cut value.

swanski82 · January 28, 2020, 9:49am

Thanks for the reply! It was indeed helpful. I just want to understand the reason of this part:

if (className[0] == 'S') { // Signal
    signal_class = *class_id;
} elseif className[0] == 'B' { // Background
    signal_class = 1-*class_id;
} else {
    throw("ERROR!");
}

sorry, I just sent my reply incomplete

anyway. AFAIK, classID (int) variable already corresponds to 0 (signal) or 1 (background).
why do we need to check the className as well? Do they overlap correspondingly?
e.g. if className = “S” class_id should be 0 ?

kialbert · January 28, 2020, 10:20am

The integer coding for the class is determined by the order of the dataloader declarations.

dataloader->AddSignalTree();     // class 0
dataloader->AddBackgroundTree(); // class 1

vs.

dataloader->AddBackgroundTree(...); // class 0
dataloader->AddSignalTree(...);     // class 1

The reason for this is to accomodate multi-class classification.

Cheers,
Kim