Not use a variable for some events

Vidya_Sagar_V · June 25, 2017, 10:52am

I am training a neural network in TMVA with 9 variables.
One of those variables has a fault.
Initially, I had 320k events, but there is a problem with 70k of them.
Is there a way where it uses all 9 variables for 250k events and only 8 of them for remaining 70k events and generate a neural network? Something like giving individual event wise weights to each variable.
If such an option is not on TMVA but available somewhere else, please refer it. It will be helpful.

eguiraud · June 26, 2017, 7:30am

Hi,
classic DNN training algorithms require a fixed number of features.

You could set the faulty variable to a meaningful constant value (e.g. -1) for the 70k events that are problematic. This would not mean “do not consider the variable”, but it somewhat adds the information, in the data-set, that those 70k events have something in common regarding that particular variable. The neural net might or might not pick it up, depending how discriminative that difference is.

A less heuristic, more grounded approach could certainly be devising a training strategy that takes into account faulty variables to possibly be ignored on an event-by-event basis, but that is not a training scheme implemented in TMVA. In fact, my knowledge is limited but I don’t know of a mainstream training algorithm with this feature – someone certainly tried something in this direction, but I don’t know of a common practice (would be happy to hear what ML experts have to say on the topic).