Dataloader configuration

calbet · March 24, 2018, 2:58pm

Dear experts,

when I configure dataloader with (1), DNN_GPU response looks ok, you can see the log (2) and the root file (3).
when I configure with (4), the DNN_GPU is strange, the bins are too big, you can see the log here (5) and the root file here (6).
Do you know why by using the ramdon (half/half) tree splitting for the trainning and testing, the DNN response differ that’s much? more over, I would expect it to be better in case of half/half as it use all the events in the tree?

(1)
dataloader->PrepareTrainingAndTestTree( mycut,
“nTrain_Signal=10000:nTrain_Background=10000:NTest_Signal=10000:NTest_Background=10000:SplitMode=Random:NormMode=NumEvents:!V” );

(2)
http://calpas.web.cern.ch/calpas/tmva/evt/log

(3)
http://calpas.web.cern.ch/calpas/tmva/evt/TMVA.root

(4)
dataloader->PrepareTrainingAndTestTree( mycut, “SplitMode=random:NormMode=NumEvents:!V” );

(5)
http://calpas.web.cern.ch/calpas/tmva/half/log

(6)
http://calpas.web.cern.ch/calpas/tmva/half/TMVA.root

moneta · March 26, 2018, 8:22am

HI,

Looking at the log file, it seems to me that the DNN is not working in both case. Looking at the big file I did not see any decrease in the test error.
I would try, still using all the events, maybe some regularisation and /or drop out, maybe a different batch size (how big is your size ?), a larger learning rate at the beginning and longer convergence steps.

If you are still having a problem, please share your data input and macro and I can have a look into it

Best Regards

Lorenzo

calbet · March 26, 2018, 11:15am

Dear Lorenzo,
I fixed the issue. While comparing with BDTG the BDTG seems to be better by 5% on the ROC curve. Shouldn’t DNN be better? is there cases where the DNN is better? is the a good wy to train the DNN? When I run ober the root test sample, the DNN looks better, but now I’m surprise to see the the BDT is better…
Regards

calbet · March 26, 2018, 2:15pm

Dear Lorenzo,
it’s not clear to me how can I check the DNN improvement? should I check that the errors decrease in (1)?
Regards

(1)
Training phase 1 of 1:
: Epoch | Train Err. Test Err. GFLOP/s Conv. Steps
: --------------------------------------------------------------
: 10 | 0.502419 0.501768 44.9089 0
: 20 | 0.505239 0.49264 45.1409 0
: 30 | 0.507477 0.551575 45.1674 10
: 40 | 0.508556 0.558475 45.1616 20
:

moneta · March 27, 2018, 8:00am

Hi,

yes, both training and testing error should decrease. At some point the test error will stop decreasing, because you are starting overfitting. There the minimiser should stop.
(see attached figure)

In your case something is not working, because there is no decrease at all of both training and test/validation error.

This you can see also in the obtained ROC curve result. You get something around 0.5 which is just the value obtained from random guessing

Lorenzo

calbet · March 27, 2018, 12:19pm

Dear Lorenzo,
I’ve changed the parameters… but the DNN still output non-sense result. Could you please give it a try?
you can see the code here (1) and the inputs (2). The problem seems to appear when I’m applying cuts "TCut mycuts = “mcut”. The cut I applied is to select evt in the signal region. Please let me know if you need more information.
Regards

(1)
http://calpas.web.cern.ch/calpas/TMVAClassification.C

(2)
http://calpas.web.cern.ch/calpas/sig.root
http://calpas.web.cern.ch/calpas/bkg.root

moneta · March 27, 2018, 3:37pm

Thank you for the files. I will look at them
I have downloaded them. But since these are big files, please avoid next time to upload there and send maybe just a link.

Lorenzo

calbet · March 29, 2018, 8:42am

Dear Lorenzo,
did you get a chance to see what was wrong?
Regards

moneta · March 29, 2018, 8:48am

Hi,

I have not find anything really wrong with the DNN. I have tried also using Keras and I get similar results. Looking at your data, I could not find by eye any real difference between the two categories, so probably one need to identify better features

Lorenzo

calbet · March 29, 2018, 9:00am

Dear Moneta,

thank you for your investigation. Do you mean that signal and bkg are to similar thus DNN can’t separate them?
Regards

moneta · March 29, 2018, 12:32pm

Yes, this is my impression. Trying also other methods (e.g. BDT) I get a very poor separation.
I would also check correctly the input data and study their input distributions. I did not have the time to look at that carefully

Cheers

Lorenzo

calbet · March 29, 2018, 2:57pm

Dear Moneta,

ok, thank you.
I noticed that the DNN does not report the variable importance number, indeed it’s always 1,
do you know why?
Regards
: -------------------------------------------
: Rank : Variable : Importance
: -------------------------------------------
: 1 : lep_Pt_0 : 1.000e+00
: 2 : lep_Pt_1 : 1.000e+00
: 3 : Mll01 : 1.000e+00
: 4 : DEtall01 : 1.000e+00
: 5 : DRll01 : 1.000e+00
: 6 : Ptll01 : 1.000e+00
: 7 : SumPtJet : 1.000e+00

moneta · April 4, 2018, 12:55pm

The DNN does not provide this ranking, as it is done for example in another methods, such as BDT. It is therefore expected to have all 1.

Lorenzo