Hi there! I’m a summer student at CERN currently working on BDT benchmarking for TMVA. However, this is my first week and I’m still very new to TMVA and ROOT, so I’m unfamiliar with certain things.
I think to illustrate my problem, its best if I immediately start with a toy example…so here we go:
// Dataset source along with variables
const std::string filepath = "http://root.cern.ch/files/tmva_class_example.root";
const std::vector<std::string> variables = {"var1", "var2", "var3", "var4"};
auto data = TFile::Open(filepath.c_str());
auto signal = (TTree*) (data->Get("TreeS"));
auto background = (TTree*) (data->Get("TreeB");
// Add variables and register the trees with the dataloader
auto dataloader = new TMVA::DataLoader("tmva003_BDT");
for (const auto &var : variables) {
dataloader->AddVariable(var);
}
dataloader->AddSignalTree(signal, 1.0);
dataloader->AddBackgroundTree(background, 1.0);
dataloader->PrepareTrainingAndTestTree("", ""); // Key step: divide into training and test trees
The above example is heavily based on that given in ROOT: tutorials/tmva/tmva003_RReader.C File Reference
At this point, I wish to access the TTree
instances corresponding to the signal and background training and test sets (which I’m assuming are 4 separate instances after applying the cut on the signal and background TTree
s in the call dataloader->PrepareTrainingAndTestTree("", "")
- or am I wrong?).
I need the resulting split signal and background datasets because if I’m going to benchmark against some other BDT package, I want to ensure that the datasets are split into the same training and test subsets.
Looking at the source files and following the chain of calls, I’ve arrived at the following partial solution (well, I think it’s in the right direction…). For example to extract the testing dataset after the cut,
dataloader->GetDefaultDataSetInfo().GetDataSet()->GetTree(TMVA::Types::kTesting)
However, I’m not sure how I can then separate into background and signal trees.
Any help would be greatly appreciated! Also feel free to highlight any misconceptions I might have! Thanks