Hi,
I have trained several TMVA classifiers. However, I can only benchmark their performance by applying them to a big list of samples (~150 files). Till now, I’ve been implementing the MVA outputs one by one.
I’d like to be able to implement several trainings (different weight files) of the SAME MVA method, but I’d like to be able to fill my tree in one single pass (the number of entries in the trees is significant and the number of variables too).
I’ve tried the following general approach (pseudo code):
struct MVA
{
std::string name; // MVA name
std::string weightfile; // MVA Weights
std::string methodName;
std::vector<std::string> inputVars; // Variables
std::vector<TTreeFormula*> inputFormulae; // TTreeFormula for getting the input variables
std::vector<Float_t> inputValues; // Values
TMVA::Reader *reader;
Float_t output; // Output
/* ... Setting up of the MVA, AddVariable calls ... works OK */
Float_t getOutput() // Gets output for the current entry.
{
for( unsigned int i = 0 ; i < inputVars.size() ; i++ )
inputValues[i] = inputFormulae[i]->EvalInstance();
output = reader->EvaluateMVA( methodName.c_str() ); // <--- Problem here
return output;
}
/* ... */
};
addMVAtoFile( const char* file, std::vector<MVA*> &mvaList )
{
TFile *f = new TFile( file );
TTree *t = (TTree*) f->Get("tree");
TTree *friendTree = new TTree("MVAtree","Tree with classifier outputs");
for( unsigned int imva = 0 ; imva < mvaList.size() ; imva++ )
br[imva] = friendTree->Branch( ... ); // Correct setup of the branch
for( ULong64_t i = 0 ; i < t->GetEntries() ; i++ )
{
/* Getting entry, assigning the TTreeFormula in MVA, setting addresses to MVA::inputValues, */
for( unsigned int imva = 0 ; imva < mvaList.size() ; imva++ )
{
mvaList[imva]->getOutput(); // Sets MVA output to MVA::output // <-- Problem here
br[imva]->Fill();
}
}
/* Saving tree, closing files. */
}
The whole thing works great in the following cases:
- There’s only ONE element in the std::vector<MVA*>
- All the MVAs in the std::vector<MVA*> have the same number of variables (didn’t check if they ought to be the exact same though)
When I have MVAs with different number of variables, I get the following error:
--- <FATAL> Norm : Transformation defined for a different number of variables 13 17
with 13 being the number of variables of the first MVA in the vector and 17 that of the second.
Having investigated further, I see that the variables are added to DataSetInfo() when calling Reader::AddVariable(…). I fear there’s somewhere a static class member that forbids me to proceed as I’d like. This error arises processing VariableNormalizeTransform.cxx but things are so intricate that it takes time to get the full picture of what’s going on.
Is there a conceptual problem to what I’m trying to do? Could a TMVA developer/expert give some insight on how to acheive what I’m trying to do?
Edit: I’m using ROOT v5.26.00a with the TMVA build included.
Thanks a lot in advance,
Karolos