Vector type in TMVAClassificationApplication

Hello dear ROOT forum,

I have trouble with TMVA’s macro (TMVAClassificationApplication).
My input ROOT file has variables in vector type. I am using

std::vector<float> *userVar0= nullptr;
std::vector<float> *userVar1= nullptr;

theTree->SetBranchAddress( "alpha", &userVar0 );
theTree->SetBranchAddress( "B_Ks1_pt", &userVar1); 

This runs smoothly but when I run the following code,

std::cout << "--- Processing: " << theTree->GetEntries() << " events" << std::endl;
TStopwatch sw;
sw.Start();
for (Long64_t ievt=0; ievt<theTree->GetEntries();ievt++) {

   if (ievt%1000 == 0) std::cout << "--- ... Processing event: " << ievt << std::endl;

   theTree->GetEntry(ievt);

   var0 = userVar0;
   var1 = userVar1;

I get the following error

assigning to 'Float_t' (aka 'float') from incompatible type 'std::vector<float> *

How should I proceed?

Hi,

Are you perhaps trying to assign a pointer to a vector of floats to a float variable (var0 and var1)?

Best,
D

Yes, I understand the error, but what do I do about it?
In the TMVAClassificationApplication macro, the test (input file) has float values, so it doesn’t give an error. For me, my input file has vectors of floats. How do I run the following?

var0 = userVar0;
var1 = userVar1;

var0 and var1 are float variables, just like in the macro we have the following,

Float_t var1, var2;
Float_t var3, var4;
reader->AddVariable( "myvar1 := var1+var2", &var1 );
reader->AddVariable( "myvar2 := var1-var2", &var2 );
reader->AddVariable( "var3",                &var3 );
reader->AddVariable( "var4",                &var4 );

I can pass only floats and int type in the reader. What should I do, if my root file has vectors and reader can only take float and int.

Hi,

Ok. Can you confirm you would like to build a classifier built on top of variables (alpha and pt) relative to B meson decays, which are stored in your file as vector of floats for each of your dataset (TTree) entry?

Best,
D

Hello,
TMVA (both in training and in the application phases) accepts only float inputs. You can store in the TTree vectors, but you would need to flatten them copying the content in float variables. For example in your case you can do:

Float_t var1, var2;
Float_t var3, var4;
reader->AddVariable( "myvar1 := var1+var2", &var1 );
reader->AddVariable( "myvar2 := var1-var2", &var2 );
reader->AddVariable( "var3",                &var3 );
reader->AddVariable( "var4",                &var4 );

std::vector<float> *userVar0= nullptr;
std::vector<float> *userVar1= nullptr;

theTree->SetBranchAddress( "alpha", &userVar0 );
theTree->SetBranchAddress( "B_Ks1_pt", &userVar1); 

for (Long64_t ievt=0; ievt<theTree->GetEntries();ievt++) {

   if (ievt%1000 == 0) std::cout << "--- ... Processing event: " << ievt << std::endl;

   theTree->GetEntry(ievt);

   // this is just an example
   var1 = (*userVar0)[0];
   var2 = (*userVar0)[1];
   var3 = .......
....

Lorenzo

Yes, that is correct.

Thank you so much. This worked!
Can we do the opposite of this, i.e., converting a float variable to a vector of floats?

Hi,

do you mean something like:

float val = 3.f;
std::vector<float> vec {val}; // or vec.emplace_back(val) if you need to fill this dynamically at runtime....

?

Best,
D

Hello,

I decided to post an example in case you want to get rid of explicit treatment of TTree instances in your code by using the modern way to deal with columnar datasets, RDataFrame. The intereseing lines are the last ones in the example.

I hope it helps!


// Interesing user code at the end!

template <class COLLSQUARE>
class CollSquareIter {
private:
   const COLLSQUARE &fCollSquare;
   const uint        fOuterCollSize = 0;
   uint              fOuterCollIdx  = 0;
   uint              fInnerCollIdx  = 0;
   bool              fHasNext       = true;

public:
   using ValueType = typename COLLSQUARE::value_type::value_type;
   CollSquareIter(const COLLSQUARE &collSquare) : fCollSquare(collSquare), fOuterCollSize(collSquare.size()){};
   const ValueType &Next()
   {
      if (fOuterCollIdx < fOuterCollSize) {
         const auto InnerCollSize = fCollSquare[fOuterCollIdx].size();
         if (fInnerCollIdx < InnerCollSize) {
            auto valPtr = &(fCollSquare[fOuterCollIdx][fInnerCollIdx]);
            if (fOuterCollIdx == fOuterCollSize - 1 && fInnerCollIdx == InnerCollSize - 1) fHasNext = false;
            ++fInnerCollIdx;
            return *valPtr;
         } else {
            ++fOuterCollIdx;
            fInnerCollIdx = 0;
            if (HasNext()) return Next();
         }
      } else {
         return fCollSquare[fOuterCollSize - 1][fCollSquare[fOuterCollSize - 1].size() - 1];
      }
      // We should never be here.
      throw std::runtime_error("The iterator is not usable, please re-instantiate one.");
   }
   bool HasNext() { return fHasNext; }
};

std::vector<float> genRndmFloatVec()
{
   const auto         size = gRandom->Integer(20);
   std::vector<float> vals;
   vals.reserve(size);
   for (auto i : ROOT::TSeqI(size)) {
      vals.emplace_back(gRandom->Uniform(0., 128.));
   }
   return vals;
}

void prepareFakeInput(const char *fileName)
{
   ROOT::RDataFrame df(32);
   df.Define("alpha", genRndmFloatVec).Define("B_Ks1_pt", genRndmFloatVec).Snapshot("theTree", fileName);
}

void iter_demo()
{
   const auto rootFileName = "theFile.root";
   prepareFakeInput(rootFileName);

   // ---- Here the interesting code -----
   ROOT::RDataFrame df("theTree", rootFileName);
   auto             alpha_Vals    = df.Take<std::vector<float>>("alpha");
   auto             B_Ks1_pt_Vals = df.Take<std::vector<float>>("B_Ks1_pt");

   CollSquareIter alpha_it(alpha_Vals.GetValue());
   CollSquareIter b_ks1_pt_it(B_Ks1_pt_Vals.GetValue());

   while (alpha_it.HasNext() && b_ks1_pt_it.HasNext()) {
      const auto alpha    = alpha_it.Next();
      const auto b_ks1_pt = b_ks1_pt_it.Next();
      std::cout << alpha << " " << b_ks1_pt << std::endl;
   }

   // --- end of the interesting code ---
}