TMultiLayerPerceptron and other NN classes need improvement

cyberkost · December 13, 2005, 11:47pm

The utility of NN in complex data analyses is (almost) unquestionable.
It’s also (almost) unquestionable that ROOT is a good choice for a
data analysis framework . Given these two facts it’s easy to
see what drove people who wrote ROOT-to-JETNET and a couple of
ROOT-to-SNNS interfaces (JETNET and SNNS being two rather solid NN
packages). However, it is kind of odd to keep your data in ROOT
TTree(s) and then interface it to NN simulators, and it’s even bigger
inconvenience to have to learn the language those NN simulators speak.
So, the implementation of some NN functionality in ROOT is a very
welcome step. But why not do it in a professional manner? I mean,
why not design it in such a way that it addresses/handles typical
problems that arise when one is using NNs? Below are a few
examples/concerns:

the typical situation is that the learning samples for SIG and BKG
are obtained from different sources (often one comes from some sort of
Monte Carlo simulation, while the other comes from a different Monte
Carlo simulation or real data). For that reason they are often kept in
different files or, in differently named trees in the same file. Why
is it that TMultilayerPerceptron provides no way to say, look here’s
my TTree for SIG (target=1), and here’s my TTree for BKG (target=0).
Why is the user burdened with merging the two TTree’s (which, rather
ironically, takes half of the mlpHiggs.C example)?
why does the TMLPAnalyzer->DrawNetwork() draw what it draws? If
anything, I would expect it to draw the same thing that
TMultiLayerPerceptron->Draw() does. And why (if it draws anything) user
has no control over the style (or has to jump through a dozen of hoops in
order to get that control). What is the point of DrawNetwork() for
neuron!=0? If there is no point, why have Int_t neuron as an argument?
If there is a point – where is that documented and how one makes
sense of what (s)he sees?
for simple classification problem (sig/bkg ~ 1/0) well trained NN
should have a property that SIG/(SIG+BKG) is a linear function of
NN_output. There has to be a method in TMLPAnalyzer that checks this.
why is that if one executes the example (mlpHiggs.C) a few times in
a raw (starting a new ROOT session each time or not) the result is
different. If that’s the desired behavior, why is it not announced?
How does one switch it off? I thought that the example should answer
questions, not cause additional ones.
TMLPAnalyzer->DrawDInputs(): is this the best way to illustrate
which variable/input is relevant and which one is not? I thought that
correlation to target is a good quantitative measure of how relevant a
particular variable/input is. How is one to interpret the picture
this method draws. Where is that explained/documented? Why are axes
no labeled?
I heard that if one is to be completely unbiased, then one
needs three samples: the standard two (learning and testing, i.e. the
“stop training”) and another “testing sample” on which to verify the
performance of the NN, e.g., do the test I ask for in 3), do the
TMLPAnalyzer->DrawNetwork(0, “target==0”, “target==1”) thing, etc.
I think the TMultiLayerPerceptron->Train() method needs a better
formatted output. It would also be nice if one could tell it to save
the best NN to file, so that in case that after 20 hours of training
the user wants to interrupt (or a machine goes down, or whatever), the
user has still something to work with.

eight) what happens if the user asks to train for 500 epochs and the
network will start getting over-trained after epoch 300? Will the user
get an over-trained network, or the “best” one will be saved at epoch
300? Where is this documented/explained?

I sure have more questions/suggestions/items_to_discuss…
Konstantin.

delaere · December 19, 2005, 1:37pm

[quote=“cyberkost”]1) the typical situation is that the learning samples for SIG and BKG
are obtained from different sources (often one comes from some sort of
Monte Carlo simulation, while the other comes from a different Monte
Carlo simulation or real data). For that reason they are often kept in
different files or, in differently named trees in the same file. Why
is it that TMultilayerPerceptron provides no way to say, look here’s
my TTree for SIG (target=1), and here’s my TTree for BKG (target=0).
Why is the user burdened with merging the two TTree’s (which, rather
ironically, takes half of the mlpHiggs.C example)?
[/quote]
In principle, using a TChain should be possible. At the time the example was written it was not yet the case. I haven’t checked recently if it works, if not everybody is welcome to fix the problem.

[quote=“cyberkost”]2) why does the TMLPAnalyzer->DrawNetwork() draw what it draws? If
anything, I would expect it to draw the same thing that
TMultiLayerPerceptron->Draw() does. And why (if it draws anything) user
has no control over the style (or has to jump through a dozen of hoops in
order to get that control). What is the point of DrawNetwork() for
neuron!=0? If there is no point, why have Int_t neuron as an argument?
If there is a point – where is that documented and how one makes
sense of what (s)he sees?[/quote]
Would TMLPAnalyzer draw the same as TMultiLayerPerceptron, it would be completely useless. If you have more than one output neuron, you will be happy to be able to draw the output of that neuron.

[quote=“cyberkost”]3) for simple classification problem (sig/bkg ~ 1/0) well trained NN
should have a property that SIG/(SIG+BKG) is a linear function of
NN_output. There has to be a method in TMLPAnalyzer that checks this.[/quote]
Feel free…

[quote=“cyberkost”]4) why is that if one executes the example (mlpHiggs.C) a few times in
a raw (starting a new ROOT session each time or not) the result is
different. If that’s the desired behavior, why is it not announced?
How does one switch it off? I thought that the example should answer
questions, not cause additional ones.[/quote]
As explained in the documentation, the training starts by randomizing the weights. This can be avoided by :

calling the randomize method by hand, or loading custom weights to start
training the NN with the “+” option, to use the predefined weights as starting point.

[quote=“cyberkost”]5) TMLPAnalyzer->DrawDInputs(): is this the best way to illustrate
which variable/input is relevant and which one is not? I thought that
correlation to target is a good quantitative measure of how relevant a
particular variable/input is. How is one to interpret the picture
this method draws. Where is that explained/documented? Why are axes
no labeled?[/quote]
TMLPAnalyzer is a collection of tools developed by users to analyze their networks. It does nothing that cannot be done by hand on the NN. It is especially code “in development”… that why there is not much eye candy. Again, if you have more advanced solution to propose, feel free to write code. I’m pretty sure that Rene will include it if it is done in a professional way.

[quote=“cyberkost”]6) I heard that if one is to be completely unbiased, then one
needs three samples: the standard two (learning and testing, i.e. the
“stop training”) and another “testing sample” on which to verify the
performance of the NN, e.g., do the test I ask for in 3), do the
TMLPAnalyzer->DrawNetwork(0, “target==0”, “target==1”) thing, etc.[/quote]
This is right, but the third sample only enters the game when the NN is ready, and should therefore not be handled in a special way by the TMLP tools.
If all your data are sitting in the same TTree, you are free to let one third neither in the training nor in the test sample.

[quote=“cyberkost”]7) I think the TMultiLayerPerceptron->Train() method needs a better
formatted output. It would also be nice if one could tell it to save
the best NN to file, so that in case that after 20 hours of training
the user wants to interrupt (or a machine goes down, or whatever), the
user has still something to work with.[/quote]
It’s up to you to split the NN training in several sub-trainings, and then to save the NN after each of them. It would then be easy to select the best one.

[quote=“cyberkost”]eight) what happens if the user asks to train for 500 epochs and the
network will start getting over-trained after epoch 300? Will the user
get an over-trained network, or the “best” one will be saved at epoch
300? Where is this documented/explained?[/quote]
Only the final network is retained. The user has to decide wether or not the network is overtrained. All the documentation is concentrated in a chapter of the user manual. That particular point is not mentioned, since this feature is not implemented.

[quote=“cyberkost”]I sure have more questions/suggestions/items_to_discuss…
Konstantin.[/quote]
As I already said, feel free to propose code to Rene. I personnaly don’t have much time to develop new things in that context within a reasonable time scale. I wrote most of it on the beach when I was diploma student , so it can certainly be improved in several ways (including the documentation).