I have a naive question. I am trying to evaluate the performance of BDT. I define a “working point” as choosing a BDT response value within [-1, 1] and want to determine the effect of this chosen cut by ROC curve.
To figure out where on the roc-curve a particular cuts ends up you can do the following (just a sketch):
TTreeReader reader (a_tmva_tree);
TTreeReaderArray<char> reader_array (reader, "className");
TTreeReaderValue<float> val (reader, "BDTG");
TTreeReaderValue<int> val (reader, "ClassID");
int nentries = a_tmva_tree.GetEntries();
int signal_class = 0;
reader->SetEntry(0);
// WARN: This is hacky
if (className[0] == 'S') { // Signal
signal_class = *class_id;
} elseif className[0] == 'B' { // Background
signal_class = 1-*class_id;
} else {
throw("ERROR!");
}
int num_true_positive = 0;
int num_false_positives = 0;
for (int i = 0; i < nentries; ++i) {
reader->SetEntry(i);
if (val >= YOUR_CUT and class_id == signal_class) {
n_true_positives += 1;
} else {
n_false_positives += 1;
}
// From nentires, n_true_positives, and n_false_positives
// you can now calculate the ROC value for your cut value.
anyway. AFAIK, classID (int) variable already corresponds to 0 (signal) or 1 (background).
why do we need to check the className as well? Do they overlap correspondingly?
e.g. if className = “S” class_id should be 0 ?