Problems with TMVA Reader and TLorentzVectors

WilliamK · June 4, 2020, 11:26am

Dear Experts,
After performing some TMVA classification, I am trying to apply it.
My classificator uses some variables (17) some of which are fromt TLorentzVector in the form “bjet.Pt()” for example.

So following the tutorial I add those 17 variables to the Reader:

reader->AddVariable(...);

Then I have to set branch addresses of every tree, but keeping in mind that some of the branch are in the form of a TLorentzVector i set the address of 11 branches, because I cannot do something like the following:

tree->SetBranchAddress("bjet.Pt()", &var);

BUT when my code arrives to the evaluation point I get this error:

                         :  Setup Keras Model 
                         :  Executing user initialization code from  metrics.py
                         :  Loading Keras Model 
                         : Loaded model from file: dataset/weights/TrainedModel_PyKeras.h5
<WARNING> <WARNING>                : Failed to run python code: for i,p in enumerate(model.predict(vals)): output[i]=p
<WARNING> <WARNING>                : 
<WARNING> <WARNING>                : Python error message:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/cms-ib.cern.ch/nweek-02631/slc7_amd64_gcc820/external/py2-Keras/2.3.1-bcolbf2/lib/python2.7/site-packages/keras/engine/training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/cvmfs/cms-ib.cern.ch/nweek-02631/slc7_amd64_gcc820/external/py2-Keras/2.3.1-bcolbf2/lib/python2.7/site-packages/keras/engine/training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "/cvmfs/cms-ib.cern.ch/nweek-02631/slc7_amd64_gcc820/external/py2-Keras/2.3.1-bcolbf2/lib/python2.7/site-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected dense_1_input to have shape (11,) but got array with shape (17,)
<FATAL>                          : Failed to get predictions
***> abort program execution
terminate called after throwing an instance of 'std::runtime_error'
  what():  FATAL error

Does anybody have a clue of what I am doing wrong and how to fix this? Seems that dense_input dimensions do not correspond, but I can’t figure out any other way to set those branch addresses.

Many thanks to anybody that tries to help!

William.

moneta · June 4, 2020, 3:49pm

HI,

From the error log you are having an error in Keras. It looks like you evaluate a model that was trained with 11 input variables (feature) with 17 inputs. So there is an inconsistency. You should use the same number of input variables when training and then evaluating the model using the Reader.

Lorenzo

WilliamK · June 4, 2020, 6:40pm

Hi @moneta!
Actually the model was trained on 17 variables and I loaded on the Reader the same variables (still 17).
But to evaluate the model it is required that I first set the branch address of every tree like showed in the first post. That cannot be done like for the Reader.

Here an example:

**** Code with definitions and other stuff *****
"This loads the variables: same ones than in  classifier"
	reader->AddVariable( "dimuon_deltar", &dimuon_deltar );
	reader->AddVariable( "dimuon_deltaphi", &dimuon_deltaphi );
	reader->AddVariable( "dimuon_deltaeta", &dimuon_deltaeta );
	reader->AddVariable( "met_pt", &met_pt );
	reader->AddVariable( "bjet_1.Pt()", &bjet_1_pt );
	reader->AddVariable( "bjet_1.Eta()",&bjet_1_eta );


"Then the following step is required"

TFile * someFile= TFile::Open("someFile.root");
TTree * someTree = (TTree*) someFile ->Get("someTree");
	sTreebbA400t5->SetBranchAddress( "dimuon_deltar", &dimuon_deltar );
	sTreebbA400t5->SetBranchAddress( "dimuon_deltaphi", &dimuon_deltaphi );
	sTreebbA400t5->SetBranchAddress( "dimuon_deltaeta", &dimuon_deltaeta );
	sTreebbA400t5->SetBranchAddress( "met_pt", &met_pt );
	sTreebbA400t5->SetBranchAddress( "bjet_1", &bjet1 );

Notice that in the second step I cannot set the branch address to bjet_1.Pt(), but I must set it to just bjet. That reduces the input dimension (here from 6 to 5) in my actual case from 17 to 11, thus producing the error.

Now what I can not seem to understand is how to avoid this error being forced to keep the number of variables to 17.

I hope this explains better the problem.

William

moneta · June 5, 2020, 7:14am

Hi
I understand it. It is the Reader that reducing the variables. I will try to reproduce the problem and investigate it

Lorenzo

moneta · June 5, 2020, 9:25am

I think the easiest solution is to not rely on the variable pointers passed to the Reader when reading the data and instead use a different Reader function when evaluate the events that requires as input an std::vector with the events values.
See https://root.cern.ch/doc/master/classTMVA_1_1Reader.html#a209436a06ae04b848c9ac98b367e0dd6

In this way you create an std::vector of size=17 and you fill it for each event with the variable values retrieved from the TTree. Something like:

std::vector<float> vars(17);
TFile * someFile= TFile::Open("someFile.root");
TTree * someTree = (TTree*) someFile ->Get("someTree");
someTree->SetBranchAddress( "dimuon_deltar", &dimuon_deltar );
....
 TLorentzVector * bjet1 = nullptr;    
someTree->SetBranchAddress( "bjet_1", &bjet1 );

for (int iev = 0; iev < someTree->GetEntries(); iev++) {
     someTree->GetEntry(iev);  
     vars[0] = dimuon_deltar; 
    ...
    vars[..] = bjet1->Pt(); 
    vars[..] = bjet1->Eta();  

    // after filling the vector call EvaluateMVA
    reader->EvaluateMVA(vars, "My_method_name");
}

Cheers

Lorenzo

moneta · June 5, 2020, 9:54am

Hi,
Actually, after further investigating, your original example will work , but you need to set yourself for each event the pointers to the variable representing the pt and Eta of the LorentzVector. So you need to do in the event loop :

for (int iev = 0; iev < someTree->GetEntries(); iev++) {
     someTree->GetEntry(iev); 

     bjet_1_pt = bjet1->Pt(); 
     bjet_1_eta = bjet1->Eta(); 

     reader->EvaluateMVA("My_method_name");

This solution is probably more CPU efficient when providing the input data to the trained model

Lorenzo

WilliamK · June 5, 2020, 9:59am

Hi Lorenzo,
Thank you very much for the replies!

William