Dear Expert:
Running TMVA DNN costs too long time, just wondering if TMVA supports multi-core or cluster mode?
Cheers, Gang
Dear Expert:
Running TMVA DNN costs too long time, just wondering if TMVA supports multi-core or cluster mode?
Cheers, Gang
Hi,
There is a multi-threaded implementation of the DNN available using the configuration option "Architechture=CPU"
. If you have a GPU available you can also run the DNN using it. This should be the fastest option (use "Architecture=GPU"
).
Running the DNN on a cluster is unfortunately not possible currently.
Cheers,
Kim
Hi, Kim:
I am using DNN_CPU which has “Multithreading=True” as default for each layer, but when I run the training, top shows that it’s still 1 core being used, although the “%CPU” could reach 130%. So TMVA runs multi-threads on only 1 core?
Cheers, Gang
Using multithreading should utilise all available cores for the computationally expensive operations.
What is your configuration? If your network/input data is too small the serial part of the calculation will dominate. You can try increasing the batch size and see if you see an improvement
Cheers,
Kim
It still use only 200% of a single core while there are 8 cores on that machine, here is the configuration:
Use[“DNN_CPU”] = 1; // Multi-core accelerated DNN.
dataloader->AddVariable( “var1 := L_RRC_ConnReq_Att - L_RRC_ConnReq_Succ”, ‘F’ );
dataloader->AddVariable( “var2 := L_RRC_ConnReq_Att + L_RRC_ConnReq_Succ”, ‘F’ );
dataloader->AddVariable( “var3 := L_RRC_ConnReq_Att * L_RRC_ConnReq_Succ”, ‘F’ );
dataloader->AddVariable( “var4 := L_RRC_ConnReq_Att / L_RRC_ConnReq_Succ”, ‘F’ );
dataloader->PrepareTrainingAndTestTree( mycuts, mycutb,
“nTrain_Signal=10000:nTrain_Background=10000:SplitMode=Random:NormMode=NumEvents:!V” );
if (Use[“DNN_CPU”] or Use[“DNN_GPU”]) {
// General layout.
TString layoutString (“Layout=TANH|128,TANH|128,TANH|128,LINEAR”);
//TString layoutString (“Layout=TANH|128,TANH|128,TANH|128,LINEAR”);
// Training strategies.
TString training0("LearningRate=1e-1,Momentum=0.9,Repetitions=1,"
"ConvergenceSteps=20,BatchSize=2560,TestRepetitions=10,"
"WeightDecay=1e-4,Regularization=L2,"
"DropConfig=0.0+0.5+0.5+0.5, Multithreading=True");
TString training1("LearningRate=1e-2,Momentum=0.9,Repetitions=1,"
"ConvergenceSteps=20,BatchSize=2560,TestRepetitions=10,"
"WeightDecay=1e-4,Regularization=L2,"
"DropConfig=0.0+0.0+0.0+0.0, Multithreading=True");
TString training2("LearningRate=1e-3,Momentum=0.0,Repetitions=1,"
"ConvergenceSteps=20,BatchSize=2560,TestRepetitions=10,"
"WeightDecay=1e-4,Regularization=L2,"
"DropConfig=0.0+0.0+0.0+0.0, Multithreading=True");
TString trainingStrategyString ("TrainingStrategy=");
trainingStrategyString += training0 + "|" + training1 + "|" + training2;
Any idea?
Cheers, Gang
Sorry I haven’t been able to get back to you yet. Will do asap.
Cheers,
Kim