Home | News | Documentation | Download

BDT output from xml file does not correspond to histogram from TMVA.root

Dear all,

I want to retrieve the classifier value of a trained BDT over a signal data set. To do this, I use pyROOT the following way:

reader = ROOT.TMVA.Reader( “!Color:!Silent” )
cosTheta = array.array(‘f’,[0])
reader.AddVariable(“cosTheta”, cosTheta)
… # add other variables

reader.BookMVA(“BDT method”,“dataset/weights/TMVAClassification_BDT_general.weights.xml”)

for i in range(len(events)):
cosTheta[0] = np.cos(Theta[i])
… # fill other variables
bdtOutput = reader.EvaluateMVA(“BDT method”)

This works fine in the sense that it returns a value for the BDT classifier, but the histogram that I then get looks nothing like what was produced when making the xml file (i.e. in the TMVA GUI). I produced the xml file by running tutorials/tmva/TMVAClassification.C, using half the signal sample to train, the other half to test. I am now trying to get the BDT output for all events in this very same signal sample. I was expecting the result to look exactly the same but it’s completely different, in particular it’s not a smooth distribution: it has several peaks.

Thank you for the help!

ROOT Version: 6.24/00
Built for linuxx8664gcc on May 21 2021, 23:47:00
From heads/latest-stable@v6-24-00-1-ge6a04a86cb

Any idea? Anyone? My problem is that I don’t get the same distribution of the bdt output when I compare what I get from the TMVA GUI when training/testing the BDT, and when I apply it on the same data set using the xml file.


This is strange. Are you are you are using the a same sets of data when using the Reader or looking at the output produced during training and examined with the GUI ?
If you can add some macros and data file showing this problem, it will be helpful too



So for instance, I tried with only 2 variables named cosTheta and conf15. The distribution of the BDT output on the train tree and the test tree look very similar. I paste here the test tree to give you an idea. Now, when I add the bdtOutput to my data tree from reading the xml file using the reader, I only ever get two values, and I therefore get this other histogram.

The way I retrieve the bdtOutput is as described above, with the recommended method using the reader. These are the same variables that were used to make the xml file, in the same order, etc.

No-one has ever seen such a problem before?

Something has gone wrong. It is difficult to say what without looking at your code and your input file.
Can you please post them, you could do privately to me in case you cannot share them