Dear All,
I posted this on the TMVA sourceforge mailing list but there wasn’t a response and it isn’t clear if that is still an active way of talking to experts. I am reposting a slightly edited version of my question (with a bit more context) here with the hope someone might be able to help.
Thanks in advance,
-Chris
Dear All,
I was wondering if anyone can help explain if I am seeing a bug or if it is the correct behavior and if so what the rational is.
For TMVA 4.1.2 we have the following bug fix (from tmva.sourceforge.net)
“Requested number of training and testing events was not correct when pre-selection cuts were applied. Now the number of requested events scales with the preselection efficiency and hence does not need to be adjusted with the pre-selection. This also corrects the problems seen in the Category classifier, where pre-selection is used to build the categories.”
This has changed the output of an analysis I am working on radically so am trying to understand how to make it do as requested.
I am using a preselection and am asking for a particular number of events in the background sample for training (to match the statistics in the smaller signal MC sample):
factory->PrepareTrainingAndTestTree(“tau_selected==1”,
“V:nTest_Signal=1500:nTest_Background=1500:nTrain_Background=32559”);
In older versions this worked fine. As the manual points out the number of events requested is for after the pre-selection and in this case 32559 events are put into the training tree.
However, now what is happening is that the efficiency of the pre-selection is also being applied to my requested number. So the selection is going like this:
Possible in input tree: 1915192
I request: 32559
after preselection: 311197
selected 5290 (not 32559)
5290 is 32559*(311197/1915192). Is this the expected behavior? If so, could someone explain the way we should approach using the variable to get the number of events we want in each sample (including if the efficiency changes)?
Thanks!
-Chris