Queries about TMVA Application

Hi,
I was running TMVA Application after training and testing. But the output histogram that I got doesn’t seem right. I am getting only a vertical peak at ‘-0.2’ (as in the attached figure). Could you check the macro and point out if anything wrong. I actually didn’t understand how the application one works. It would be good if you could explain a bit like why we are only feeding the input of the signal and not the background. The Training and Application macros along with the signal and background input files can be found here.

Canvas_1

Regards,
Saumyen

Hello,

does the training work correctly? Did you e.g. try the TMVAGUI.C to check that the models respond correctly? (A broken model can also create responses like this …)

Yes, the training worked and the TMVAGUI also appeared.
Thanks Stephan.

Saumyen

I found the problem:
You are never writing into the variables that you gave to the reader. See the assignments in the loop in the example code here:
https://root.cern.ch/doc/master/classTMVA_1_1Reader.html

I guess what can also work is to attach the variables both to the reader and as branches to the tree, so the tree directly loads them in a place where the reader picks them up.

Thanks again, Stephan. I got the mistakes. And corrected it. Could you please check the code again; because, I am getting the same kind of result again. Here’s the code.
Application.C (3.6 KB)
Canvas_1

Thanks,
Saumyen

I cannot run your example, but can you simply test if the values of zlepM, zdPHI etc change when you load the next event? Just print them.

Yes, I checked and there is something wrong. I am getting values ‘0’ for all events. That means it’s not getting loaded to the reader, right? But, how to fix this?

Saumyen

Can you first check that values are getting loaded from the tree?

Yes. So when I added the second print option it printed ‘0’ for all, but no error. When I added the first one to check the variables from the TTree then I got errors. The event loop was this:

for (Long64_t ievt=0; ievt<theTree->GetEntries();ievt++) {
 if (ievt%1000 == 0) std::cout << "--- ... Processing event: " << ievt << std::endl;

 theTree->GetEntry(ievt);
//Print variables from the Tree
if(ievt>1000 && ievt<1010){cout<< "lepPt = " <<lepPt<<"\t MissingET = "<<zMissingET<<endl;}

zlepPt = lepPt;
zETAlep = ETAlep;
zPHIlep = PHIlep;
zlepM = lepM;
zdPHI = dPHI;
zdRll = dRll;
zaxialMET = axialMET;
zfracPT =fracPT;
zMissingET = MissingET;
zETAmiss = ETAmiss;
zPHImiss = PHImiss;

//Print variables for the Reader
if(ievt>1000 && ievt<1010){cout<< "zlepPt = " <<zlepPt<<"\t zMissingET = "<<zMissingET<<endl;}

if (Use["BDTF"  ])  histBdtF ->Fill( reader->EvaluateMVA( "BDTF method" ) );
}

The error I got is this:

Saumyen

I guess this is because the variables were saved as Double_t and here I am calling as Float_t. So I changed it. Then I am getting this:

Should I change the variables for the Reader too to Float_t?

Saumyen

The reader takes floats, as you can see in the documentation I linked at the beginning.
So yes, make it floats, and assign from the doubles to the floats, so the conversion to floats happens automatically.

Okay, I see. Then what can be the fix?
By the way, in the input file, there are NaN values to the variables. In the Training macro, I put a cut. But how to put the cut in Application?

Regards,
Saumyen

The fix:

  1. Make a double, connect to tree.
  2. Make a float, connect to reader.
  3. After loading an event, assign:
    inputReader = outputTree.

How to cut?
Just skip the event.

if (std::isnan(value))
  continue;

Okay, so the cut thing never worked I tried different argument for the if-loop, like if(std::isnan(axialMET)) or if (!std::isnan(axialMET)) or if(!TMath::IsNaN(axialMET)) or if(TMath::IsNaN(axialMET)). None worked. So I tried with an input that doesn’t have a NaN. And It worked fine (I guess). I got this output histo.
Canvas_1
This looks fine, right, compared to the test output!
mva_BDTF

Thank you so so much, Stephan, for this help.

But, what about the other question? I mean why we are only giving input of the signal and not of the background? How do we compare and confirm the separation from the background?

Thanks again,
Saumyen

If you get real NaNs, cutting them out should work. Also note that every comparison with a NaN will always fail, so even this

if (!(0 < MET && MET < 100)
 || !(0 < PT && PT < 150 ) )
  continue;

etc will cut out NaNs. You have to make sure, though, that you test all variables that go into the MVA.

I don’t see the other question, so I couldn’t answer it. It doesn’t make sense to only give the input of the signal, so I wouldn’t know why one would do that.
How do you confirm separation? You run separately for signal and background, and check that the values are different. That’s what the blue and yellow plot from the TMVAGui.C is.

Thanks Stephan again.
Actually, whichever application macro I saw so far they only calling the signal tree and not the background one (even if it’s in the same file). The TMVAClassificationApplication.C file in the tutorial directory, that also call the Signal tree only. So I was wondering if it’s like you call the signal first and then the background separately and then superimpose the two histograms manually later. Is it the case, or I am getting it wrong?

Another question was if I should use (as input) in application the same input files that I trained with. Or, I can use different input files of the same model.
Sorry if these questions sound trivial. I am actually trying to understand what is the purpose of this application and how it works.

Regards,
Saumyen

Yes.

If you use the same input for training and application, you can just test the model, but you cannot use this for anything serious, since models usually perform better on their training data.
The whole point of machine learning is that you can later apply your trained model to new data, so it can make some predictions.
So yes, you can use the same files, but it’s not advisable unless you want to run some tests.

The purpose is to evaluate the trained model on new data. That can either be simulations where you know the result already. In this case it’s to test how good the model is. Or it’s data where you don’t know what it is, and here you use the classifier to make a prediction for you.

Thank you so much Stephan for the help and the explanation. Thanks a lot. This will help me…

Regards,
Saumyen

sorry , how could you get the second picture ? TMVAClassificationApplication.C’s output is just TMVApp.root ,this root file doesnot includes the second picture.(my english is poor)