Hello everybody!
I have a question which is a mixture of programming and statistics.
I have a program which generates events from two bidimensional gaussian, one is the signal, the other the background. I need to perform a MVA technique to separate the two classes of events and I have decided to use a Fisher discriminants with TMVA package.
I think the implementation in my code is correct, but I report it here just for checking:
TFile f("fisher.root", "RECREATE");
TMVA::Factory * factory = new TMVA::Factory("TMVAanalysis", &f, "");
TMVA::DataLoader * dataloader = new TMVA::DataLoader ("data");
dataloader -> AddSignalTree(sgl);
dataloader -> AddBackgroundTree(bkg);
dataloader -> AddVariable("x", 'F');
dataloader -> AddVariable("y", 'F');
factory -> BookMethod(dataloader, TMVA::Types::kFisher, "Fisher", "");
factory -> TrainAllMethods();
factory -> TestAllMethods();
factory -> EvaluateAllMethods();
sgl and bkg are TNtuple which contain the generation of x and y for signal and background.
I get the following output:
<HEADER> DataSetInfo : [data] : Added class "Signal"
: Add Tree sgl of type Signal with 50000 events
<HEADER> DataSetInfo : [data] : Added class "Background"
: Add Tree bkg of type Background with 50000 events
<HEADER> Factory : Booking method: Fisher
:
<HEADER> Factory : Train all methods
<HEADER> DataSetFactory : [data] : Number of events in input trees
:
:
: Dataset[data] : Weight renormalisation mode: "EqualNumEvents": renormalises all event classes ...
: Dataset[data] : such that the effective (weighted) number of events in each class is the same
: Dataset[data] : (and equals the number of events (entries) given for class=0 )
: Dataset[data] : ... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...
: Dataset[data] : ... (note that N_j is the sum of TRAINING events
: Dataset[data] : ..... Testing events are not renormalised nor included in the renormalisation factor!)
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 25000
: Signal -- testing events : 25000
: Signal -- training and testing events: 50000
: Background -- training events : 25000
: Background -- testing events : 25000
: Background -- training and testing events: 50000
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ------------------------
: x y
: x: +1.000 +0.496
: y: +0.496 +1.000
: ------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ------------------------
: x y
: x: +1.000 +0.398
: y: +0.398 +1.000
: ------------------------
<HEADER> DataSetFactory : [data] :
:
<HEADER> Factory : [data] : Create Transformation "I" with events from all classes.
:
<HEADER> : Transformation, Variable selection :
: Input : variable 'x' <---> Output : variable 'x'
: Input : variable 'y' <---> Output : variable 'y'
<HEADER> TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: x: 1.9871 2.1206 [ -0.99690 6.9825 ]
: y: 1.9934 2.1265 [ -0.99390 6.9700 ]
: -----------------------------------------------------------
: Ranking input variables (method unspecific)...
<HEADER> IdTransformation : Ranking result (top variable is best ranked)
: --------------------------
: Rank : Variable : Separation
: --------------------------
: 1 : x : 9.971e-01
: 2 : y : 9.967e-01
: --------------------------
<HEADER> Factory : Train method: Fisher for Classification
:
<HEADER> Fisher : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: x: -1.316
: y: -1.326
: (offset): +5.258
: -----------------------
: Elapsed time for training with 50000 events: 0.0171 sec
<HEADER> Fisher : [data] : Evaluation of Fisher on training sample (50000 events)
: Elapsed time for evaluation of 50000 events: 0.00868 sec
: Creating xml weight file: data/weights/TMVAanalysis_Fisher.weights.xml
: Creating standalone class: data/weights/TMVAanalysis_Fisher.class.C
<HEADER> Factory : Training finished
:
: Ranking input variables (method specific)...
<HEADER> Fisher : Ranking result (top variable is best ranked)
: ----------------------------
: Rank : Variable : Discr. power
: ----------------------------
: 1 : y : 7.878e-01
: 2 : x : 7.867e-01
: ----------------------------
<HEADER> Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Reading weight file: data/weights/TMVAanalysis_Fisher.weights.xml
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: Fisher for Classification performance
:
<HEADER> Fisher : [data] : Evaluation of Fisher on testing sample (50000 events)
: Elapsed time for evaluation of 50000 events: 0.00865 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: Fisher
:
<HEADER> Fisher : [data] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_Fisher : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: x: 1.9936 2.1256 [ -0.99962 6.9937 ]
: y: 1.9981 2.1285 [ -0.99477 6.9724 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: data Fisher : 1.000
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: data Fisher : 1.000 (1.000) 1.000 (1.000) 1.000 (1.000)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Dataset:data : Created tree 'TestTree' with 50000 events
:
<HEADER> Dataset:data : Created tree 'TrainTree' with 50000 events
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
Again I think there are no mistakes.
From this output I get the Fisher coefficients, from which I get the plane which defines the axis which maximaze the separation: z = 5.258 - 1.316x - 1.326y.
To obtain the axis in two dimensions I have found the intersection with the plane z=0, and so I get y = 3.97 - (0.99*x). Is it right?
I report the graph here:
es_7.pdf (636.8 KB)
From this graph, I think it is quite clear that the axis is not the right one.
Where my resoning fails? Thanks in advance!