Tmva crashes at preparing gaussian transformation

Danny_Lee · July 25, 2017, 6:46am

I’m stuck on figuring what is the exact problem I’m facing as I try to test a simple classification script.

I could run the tmva examples in ROOT. however for some reason I am not able to reproduce the same success on the problem I am working on. Another thing to note is that I am running this in python since I hope to use some of the ML libraries which is mostly in python.

When ever I run my script, my desktop crashes at preparing the Gaussian transformation with the desktop becoming unresponsive that I have to force restart it.

from ROOT import TMVA, TFile, TTree, TCut, gROOT
from os.path import isfile
import numpy as np

from os import environ
environ['KERAS_BACKEND'] = 'tensorflow'

from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.regularizers import l2
from keras import initializers
from keras.optimizers import SGD

# Setup TMVA
TMVA.Tools.Instance()
TMVA.PyMethodBase.PyInitialize()

fout = TFile.Open('output.root','RECREATE')
factory = TMVA.Factory('TMVAClassification', fout,
	'!V:!Silent:Color:DrawProgressBar:Transformations=D,G:AnalysisType=Classification')

# Load training data

trainfile = TFile.Open('signal.root')
trainfile2 = TFile.Open('background.root')
signal = trainfile.Get('tree')
background = trainfile2.Get('tree')

dataloader = TMVA.DataLoader('weights')
for branch in signal.GetListOfBranches():
	if branch.GetName() not in prok:
		print(branch.GetName())
		dataloader.AddVariable(branch.GetName())

dataloader.AddTree(signal,'signal')
dataloader.AddTree(background,'background')

# Generate Model\ Neural Network
num_input=5
num_output=2
num_node_hid_layer=3
num_hid_layer=1
l2_val=1e-2

# Input layer
model = Sequential()
model.add(Dense(num_node_hid_layer, kernel_initializer='random_normal', kernel_regularizer=l2(l2_val), input_dim=num_input))
model.add(Activation('softmax'))

# Hidden layers
for k in range(num_hid_layer-1):
	model.add(Dense(num_output, kernel_initializer='random_normal', kernel_regularizer=l2(l2_val)))
	model.add(Activation('softmax'))

# Output layer
model.add(Dense(num_output, kernel_initializer='random_normal'))
model.add(Activation('softmax'))

# Compile model
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01,clipnorm=1.), metrics=['accuracy'])

# Save Model
model.save('model.h5')
model.summary()

# # Visualise model
from keras.utils import plot_model
plot_model(model, to_file='model.png')

# # Book Method 
# factory.BookMethod(dataloader, TMVA.Types.kDNN, 'DNN','!H:V:VarTransform=N:ErrorStrategy=CROSSENTROPY:WeightInitialization=XAVIERUNIFORM:Layout=TANH|100,TANH|80,TANH|50,TANH|20,LINEAR:TrainingStrategy=LearningRate=1e-1,Momentum=0.7,Repetitions=1,ConvergenceSteps=300,BatchSize=20,DropConfig=0.0+0.5+0.5+0.0,WeightDecay=0.001,Regularization=L2,TestRepetitions=15,Multithreading=True')
factory.BookMethod(dataloader, TMVA.Types.kPyKeras, "Keras_h5",
	'!H:!V:VarTransform=D,G:FilenameModel=model.h5:NumEpochs=150:BatchSize=10')

# # Run TMVA
factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()

Is my code was done wrongly? or something I have missed?
Any help are appreciated.

Danny

kialbert · August 2, 2017, 10:15am

Hi,

That should not happen

Does the crash persist if you do not use keras? What happens if you switch to c++ (perhaps saving the h5 file in a separate script)?

Cheers,
Kim

Danny_Lee · August 2, 2017, 10:38am

hi Kim,

I did some various test since I posted the question. At that time, I tried using just only either tmva’s dnn or keras only and both came out the same. After that I found some old tmva mailing list and realised that it is probably some branches I have that are nearly identical so I removed those but still the same.

As the reply from my other post, I found out later that tmva was unable to handle too many branches in my machine which leads me to just test it with just about 6 variables or so at the end which I managed to execute the training script and finishes it. Probably there is some limit to how many tmva::dataloader can handle depending on the computer specs?

Danny

kialbert · August 2, 2017, 12:28pm

Hi,

Ok, thanks for the info. Yeah, it could be that the dataset is too big to handle in memory. How much RAM does you machine have, and how big are the events that you are using? I understand that you are now using other tools, but it would be helpful for us in the long term to be able to reproduce this issue.

Danny_Lee · August 2, 2017, 1:09pm

4 GB, I’m running it on CPU for the tmva so it is an i3 chip.

the numbers of variables/branches are up to 60. For my case, when there is more than 12 variables, the machine just crashes/freeze. I have about 200k event corresponds to the number of variables.

kialbert · August 2, 2017, 4:58pm

Thanks!

It seems weird, it’s not too much data…