Using multiple TMVA classifier outputs to fill tree

karolos · February 18, 2010, 4:22am

Hi,

I have trained several TMVA classifiers. However, I can only benchmark their performance by applying them to a big list of samples (~150 files). Till now, I’ve been implementing the MVA outputs one by one.

I’d like to be able to implement several trainings (different weight files) of the SAME MVA method, but I’d like to be able to fill my tree in one single pass (the number of entries in the trees is significant and the number of variables too).

I’ve tried the following general approach (pseudo code):

struct MVA
{
	std::string name; // MVA name
	std::string weightfile; // MVA Weights
	std::string methodName;
	
	std::vector<std::string> inputVars; // Variables
	std::vector<TTreeFormula*> inputFormulae; // TTreeFormula for getting the input variables
	std::vector<Float_t> inputValues; // Values 
	
	TMVA::Reader *reader;
	Float_t output; // Output

	/* ... Setting up of the MVA, AddVariable calls ... works OK */	

	Float_t getOutput() // Gets output for the current entry.
	{
		for( unsigned int i = 0 ; i < inputVars.size() ; i++ )
			inputValues[i] = inputFormulae[i]->EvalInstance();

		output = reader->EvaluateMVA( methodName.c_str() ); // <--- Problem here
		return output;
	}

	/* ... */	
};

addMVAtoFile( const char* file, std::vector<MVA*> &mvaList )
{
	TFile *f = new TFile( file );
	TTree *t = (TTree*) f->Get("tree");

	TTree *friendTree = new TTree("MVAtree","Tree with classifier outputs");

	for( unsigned int imva = 0 ; imva < mvaList.size() ; imva++ )
		br[imva] = friendTree->Branch( ... ); // Correct setup of the branch

	for( ULong64_t i = 0 ; i < t->GetEntries() ; i++ )
	{

	/* Getting entry, assigning the TTreeFormula in MVA, setting addresses to MVA::inputValues, */

		for( unsigned int imva = 0 ; imva < mvaList.size() ; imva++ )
		{
			mvaList[imva]->getOutput(); // Sets MVA output to MVA::output // <-- Problem here
			br[imva]->Fill();
		}
	}

	/* Saving tree, closing files. */
}

The whole thing works great in the following cases:

There’s only ONE element in the std::vector<MVA*>
All the MVAs in the std::vector<MVA*> have the same number of variables (didn’t check if they ought to be the exact same though)

When I have MVAs with different number of variables, I get the following error:

--- <FATAL> Norm                     : Transformation defined for a different number of variables 13  17

with 13 being the number of variables of the first MVA in the vector and 17 that of the second.

Having investigated further, I see that the variables are added to DataSetInfo() when calling Reader::AddVariable(…). I fear there’s somewhere a static class member that forbids me to proceed as I’d like. This error arises processing VariableNormalizeTransform.cxx but things are so intricate that it takes time to get the full picture of what’s going on.

Is there a conceptual problem to what I’m trying to do? Could a TMVA developer/expert give some insight on how to acheive what I’m trying to do?

Edit: I’m using ROOT v5.26.00a with the TMVA build included.

Thanks a lot in advance,
Karolos

hoecker · March 2, 2010, 10:15pm

Forwarding Peter Speckmayer’s reply from sourceforge forum:

Dear Karolos,

TMVA is designed to train several “methods” and compare their performance for the same variable
configuration. At the design we didn’t plan for a use case like yours. Presently one cannot
instanciate several Readers at the same time. There might be a way to make it possible, but this
requires some tests and such a fix will not be in TMVA soon.

For the time being you have to run several times over your events, once for each variable
configuration.

cheers,
Peter