Dynamicaly add variables to TMVA Reader regardless of branch type

Hello,

I want to have the user input a vector of strings to be used in the TMVA reader object, but i run into problems when i have more then one type of object in the tree. I.E 1 branch which is of type float and another which is an int. What i currently have is:

// assume inputs.variable_Names is a vector of strings containing the name of branches
   vector<Float_t> container(inputs.variable_Names.size());
    int i = 0;
    while (i < inputs.variable_Names.size()){
        reader->AddVariable( inputs.variable_Names[i], &(container[i]) );
        i++;
    }

and again later when i’m evaluating my TMVA:

    double result;
    for(int i = 0; i < n_Events; i++){
        for (int j = 0; j < inputs.variable_Names.size(); j++){
            MonteCarlo  -> SetBranchAddress(inputs.variable_Names[j], &(container[j]));
            MonteCarlo -> GetEvent(i);
        }
        MC_Weights_vec[i] = MC_weight_Holder;
        result = reader -> EvaluateMVA( "BDTG method" );

Which will obviously run into issues if the branch has values within that are not of type float.

Is there a more elegant way of reading in an arbitrary number of variables into the TMVA of several different types (int, float, double…)?

Thanks in advance!

Hi,
You can use the interface of the reader, Reader::EvaluateMVA (see ROOT: TMVA::Reader Class Reference) where you pass an vector<float>, and you can build this vector by converting from int types.
Note that you can use also the new RReader interface where you can pass an RTensor.
See the tutorial ROOT: tutorials/tmva/tmva003_RReader.C File Reference

Lorenzo

Hello,

Thank you for the reply, although i am having difficulty solving my problem based off your answers, perhaps could you flush out your answer a little more? Ill be clearer: The problem is not really with adding the variables to the reader, but instead in the line:

MonteCarlo  -> SetBranchAddress(inputs.variable_Names[j], &(container[j]));

Where my ‘container’ is a vector of floats, but if the values of the variable itself is are “ints” in the input Root file, i cannot read it in.

Hi,

If the variables in the ROOT files are ints, you should define the container as a vector<int> and not vector<float>. If at the end you want a vector<float> you should later copy each element from vector<int> into a vector<float>.

Lorenzo

Thankyou again for the response.
I have realized that my problem is not with the TMVA, but instead with SetBranchAddress.

Not all variables are int, only some of them are. And i do not know before runtime which vectors variables are int and which are float. if i use vector< int >, then the float variables wont read in correctly. I want to be able to read in an arbitrary number of variables of several different types (int, float, double…).

If i make container[j] a vector of floats, i get this error: Error in <TTree::SetBranchAddress>: The pointer type given "Float_t" (5) does not correspond to the type needed "Int_t" (3) by the branch: Var_3 .

If i make container[j] a vector of Ints, i get thie error: <FATAL> : Reader::AddVariable( const TString& expression, Int_t* datalink ), this function is deprecated, please provide all variables to the reader as floats

So i need a way to figure out a way around this

Perhaps i am misunderstanding your responses, but i do not think they have provided a solution to this problem.

Thank you for your patience and assistance in this, it is much appreciated!

Cheers,

Matt

Hello,

I am sorry, but I think I have misunderstood then your problem. Can you please tell me the exact Tree data structure, you can send me the output of TTree::Print() and which variables you want to provide as input to the TMVA Reader evaluation

Lorenzo

Sorry or the delayed response, i attached 2 screenshots of some of the variables to be used… I would like to provide some variables like truth_pt_Balance,truth_dy_jj and truth_NgapJets25 to name a few, but there are many so i wont list all of them.


Thanks for the screenshot.
I see some branches are floats, some other are integers. You should use the correct type in SetBranchAddress of what is defined in the TTree.
For using the Reader, I would then copy all the variables read from the tree in an std::vector that you can pass to the TMVA::Reader::Evaluate.
If you share also your Reader code, I can help you add the required changes to fix it. I would need to see also the XML file defining the model

Lorenzo

Hello, Here is a snapshot of the relevant part of the code. It is part of a larger piece of code that i do not wish to post, so i deleted a chunk of stuff, so this will not run as is:

BDT_N::BDT_N( TString in_input_File ){
    set_initials( in_input_File );
    TMVA::Reader *reader = new TMVA::Reader( "!Color:!Silent" );

    vector<Float_t> container(inputs.variable_Names.size());

    TFile *input_MC(0);
    TFile *input_Target(0);
    TTree *MonteCarlo           = (TTree*)input_MC->Get(inputs.MC_Tree_Name);
    TTree *Target               = (TTree*)input_Target->Get(inputs.target_Tree_Name);

    Float_t MC_weight_Holder;
    Float_t MC_Reweight_Holder;
    Float_t Target_weight_Holder;
    Float_t event_Holder;

    n_MC_Events  = (Int_t)MonteCarlo->GetEntries();
    n_Target_Events  = (Int_t)Target->GetEntries();

    MonteCarlo     -> SetBranchAddress( "initialWeight", &MC_weight_Holder );
    Target         -> SetBranchAddress( "initialWeight", &Target_weight_Holder );

    // Define the input variables that were used for the MVA training
    int i = 0;
    while (i < inputs.variable_Names.size()){
        reader->AddVariable( inputs.variable_Names[i], &(container[i]) );
        i++;
    }
    //reader->BookMVA( "BDT method", BDT_weightFile  ); //methodName, weightfile
    reader->BookMVA( "BDTG method", BDT_weightFile  ); //methodName, weightfile

    double result;
    for(int i = 0; i < n_MC_Events; i++){

        for (int j = 0; j < inputs.variable_Names.size(); j++){
            MonteCarlo  -> SetBranchAddress(inputs.variable_Names[j], &(container[j]));
            MonteCarlo -> GetEvent(i);
        }
        result = reader -> EvaluateMVA( "BDTG method" ); // Do i want the
        MC_ReWeights_vec[i] =  ((1.0 + result) / (1.0 - result)) * MC_Weights_vec[i] ;
    }
} 

Yes using the correct type is the problem. I would like to do this without having to look into the Root file to determine the data type beforehand.

This is set up so that any variables that are of type float can be read, but errors occure when int variables are used ( see line : vector<Float_t> container ... ). Recall that inputs.variable_Names is a list of strings containing the name of the variables used.

I cannot attach the xml because it seems as though the cite wont accept the file format. If there is a better was please let me know.

Thank you again for your help

Hello,
I would read first the entries from the TTree then convert from integer to floats as below:

BDT_N::BDT_N( TString in_input_File ){
    set_initials( in_input_File );
    TMVA::Reader *reader = new TMVA::Reader( "!Color:!Silent" );

    vector<Float_t> container(inputs.variable_Names.size());

    TFile *input_MC(0);
    TFile *input_Target(0);
    TTree *MonteCarlo           = (TTree*)input_MC->Get(inputs.MC_Tree_Name);
    TTree *Target               = (TTree*)input_Target->Get(inputs.target_Tree_Name);

    Float_t MC_weight_Holder;
    Float_t MC_Reweight_Holder;
    Float_t Target_weight_Holder;
    Float_t event_Holder;

    n_MC_Events  = (Int_t)MonteCarlo->GetEntries();
    n_Target_Events  = (Int_t)Target->GetEntries();

    MonteCarlo     -> SetBranchAddress( "initialWeight", &MC_weight_Holder );
    Target         -> SetBranchAddress( "initialWeight", &Target_weight_Holder );

    // Define the input variables that were used for the MVA training
    int i = 0;
    while (i < inputs.variable_Names.size()){
        reader->AddVariable( inputs.variable_Names[i], &(container[i]) );
        i++;
    }
    //reader->BookMVA( "BDT method", BDT_weightFile  ); //methodName, weightfile
    reader->BookMVA( "BDTG method", BDT_weightFile  ); //methodName, weightfile

  // define here the variables to read from the TTree: 
 int truth_NgapJets25;
 ...
 MonteCarlo  -> SetBranchAddress("truth_NgapJets25", &truth_NgapJets25);
   //same here for other variables 
   ......

    double result;
    for(int i = 0; i < n_MC_Events; i++){

     //   for (int j = 0; j < inputs.variable_Names.size(); j++){
     //       MonteCarlo  -> SetBranchAddress(inputs.variable_Names[j], &(container[j]));
     //       MonteCarlo -> GetEvent(i);
     //   }
      MonteCarlo -> GetEntry(i);
      // fill here the container with your variables
     container[0] = float(truth_NgapJets25);
    .......

     // evaluate MVA 
     reasult = reader -> EvaluateMVA(container, "BDTG method); 

      //  result = reader -> EvaluateMVA( "BDTG method" ); // Do i want the
        MC_ReWeights_vec[i] =  ((1.0 + result) / (1.0 - result)) * MC_Weights_vec[i] ;
    }
}

This would require the knowledge that truth_NgapJets is an int before having run the code.I believe i have said, but I would like to do this without having to look into the Root file to determine the data type beforehand.

By "defining the variables to be read from the TTree, ie hard coding in the variable truth_NgapJets25, you assume that that variable MUST be in the Root file you provide the code. The Root file will not be the same file every time. I don’t want to have to go into the code and copy and paste the line “SetBranchAdress(…)” every single time i want to run the code on different data. I want it to do that dynamically, given only the name of the variable as input.

Thank you for your response and patience, I hope the problem i am trying to solve is clearer now?