TMVA TrainAllMethods() Crash due to MethodBase::MethodBaseDir()

Hi,
I am using ROOTv6.08/04 on Fedora23. I am getting following crash when using TMVA after calling TrainAllMethods().

_ MGR.TrainAllMethods()
Info in <MMLTrainer::TrainAllMethods>: Factory 1 : Output file /home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001.root:/

Factory                  : Train all methods
Error in <TFile::cd>: Unknown directory home
DataSetFactory           : [/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] : Number of events in input trees
                         : Dataset[/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] :     Regression requirement: "(( status_4CCNN == 1 ) )&&(HillasPar.Size > 100.0)"
                         : Dataset[/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] :     Regression      -- number of events passed: 5234   / sum of weights: 5234 
                         : Dataset[/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] :     Regression      -- efficiency             : 0.418955
                         : Dataset[/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] :  you have opted for interpreting the requested number of training/testing events
                         :  to be the number of events AFTER your preselection cuts
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Regression -- training events            : 1000
                         : Regression -- testing events             : 4234
                         : Regression -- training and testing events: 5234
                         : Dataset[/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] : Regression -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.418955
                         : 
DataSetInfo              : Correlation matrix (Regression):
                         : ------------------------------------------------------------------------------------------------
                         :                        HillasPar.Length HillasPar.Width log10(HillasPar.Size) HillasPar.Leakage1
                         :      HillasPar.Length:           +1.000          +0.400                +0.307             +0.185
                         :       HillasPar.Width:           +0.400          +1.000                +0.299             +0.124
                         : log10(HillasPar.Size):           +0.307          +0.299                +1.000             +0.197
                         :    HillasPar.Leakage1:           +0.185          +0.124                +0.197             +1.000
                         : ------------------------------------------------------------------------------------------------
DataSetFactory           : [/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] :  
                         : 
Factory                  : [/home/chinmay/MaceSIM-PROOF-TestData/OutFiles/TestML/TestML_cat0001] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'HillasPar.Length' <---> Output : variable 'HillasPar.Length'
                         : Input : variable 'HillasPar.Width' <---> Output : variable 'HillasPar.Width'
                         : Input : variable 'LogSize' <---> Output : variable 'LogSize'
                         : Input : variable 'HillasPar.Leakage1' <---> Output : variable 'HillasPar.Leakage1'
TFHandler_Factory        :           Variable                  Mean                  RMS          [        Min                  Max ]
                         : -------------------------------------------------------------------------------------------------------------
                         :   HillasPar.Length:             0.34148            0.085230   [             0.12915             0.72778 ]
                         :    HillasPar.Width:             0.18010            0.050873   [            0.068749             0.42269 ]
                         :            LogSize:              2.5330             0.35034   [              2.0019              4.2891 ]
                         : HillasPar.Leakage1:            0.021876            0.035504   [              0.0000             0.27600 ]
                         :     HillasPar.Dist:             0.86931             0.34324   [            0.049042              1.9034 ]
                         : -------------------------------------------------------------------------------------------------------------
                         : Ranking input variables (method unspecific)...
Factory                  : Train method: MLP for Regression
                         : 

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f17e941549a in waitpid () from /lib64/libc.so.6
#1  0x00007f17e93909ab in do_system () from /lib64/libc.so.6
#2  0x00007f17ea490e01 in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x1fc64f0) at /home/chinmay/ROOT/root-6.08.04/core/unix/src/TUnixSystem.cxx:2118
#3  TUnixSystem::StackTrace (this=0x1fc64f0) at /home/chinmay/ROOT/root-6.08.04/core/unix/src/TUnixSystem.cxx:2405
#4  0x00007f17ea49351c in TUnixSystem::DispatchSignals (this=0x1fc64f0, sig=kSigSegmentationViolation) at /home/chinmay/ROOT/root-6.08.04/core/unix/src/TUnixSystem.cxx:3663
#5  <signal handler called>
#6  0x00007f17d2c968af in TMVA::MethodBase::MethodBaseDir (this=this
entry=0x3e153b0) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/MethodBase.cxx:1981
#7  0x00007f17d2c97326 in TMVA::MethodBase::BaseDir (this=this
entry=0x3e153b0) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/MethodBase.cxx:1939
#8  0x00007f17d2ca2c59 in TMVA::MethodBase::TrainMethod (this=this
entry=0x3e153b0) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/MethodBase.cxx:653
#9  0x00007f17d2c501ac in TMVA::Factory::TrainAllMethods (this=this
entry=0x3401960) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/Factory.cxx:879
#10 0x00007f17d5d4d584 in MMLTrainer::TrainAllMethods (this=0x33c4910) at /home/chinmay/MaceSIM-PROOF-DEV/MaceSIM/src/SimManager/MSimTMVAInterface.cxx:749
#11 0x00007f17eaa00070 in ?? ()
#12 0x00000001eaa00000 in ?? ()
#13 0x00000000020785d0 in ?? ()
#14 0x00007f17e62f1220 in ?? () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#15 0x00007ffecb0fa680 in ?? ()
#16 0x00007f17eaa00000 in ?? ()
#17 0x00007f17e6291c46 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#18 0x00007f17e62934bd in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#19 0x00007f17e6293738 in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#20 0x00007f17e6316c77 in cling::MetaProcessor::process(char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#21 0x00007f17e6205a66 in HandleInterpreterException (metaProcessor=<optimized out>, input_line=<optimized out>, compRes=
0x7ffecb0fa55c: cling::Interpreter::kSuccess, result=result
entry=0x7ffecb0fa680) at /home/chinmay/ROOT/root-6.08.04/core/meta/src/TCling.cxx:1882
#22 0x00007f17e6215f0a in TCling::ProcessLine (this=0x201eaa0, line=<optimized out>, error=0x7ffecb0faa6c) at /home/chinmay/ROOT/root-6.08.04/core/meta/src/TCling.cxx:2048
#23 0x00007f17ea387f2e in TApplication::ProcessLine (this=this
entry=0x20107d0, line=<optimized out>, sync=sync
entry=false, err=err
entry=0x7ffecb0faa6c) at /home/chinmay/ROOT/root-6.08.04/core/base/src/TApplication.cxx:1005
#24 0x00007f17ea7ae74e in TRint::ProcessLineNr (this=this
entry=0x20107d0, filestem=filestem
entry=0x7f17ea7bcbe8 "ROOT_prompt_", line=0x3f06990 "MGR.TrainAllMethods()", error=0x7ffecb0faa6c, error
entry=0x0) at /home/chinmay/ROOT/root-6.08.04/core/rint/src/TRint.cxx:749
#25 0x00007f17ea7aeab5 in TRint::HandleTermInput (this=0x20107d0) at /home/chinmay/ROOT/root-6.08.04/core/rint/src/TRint.cxx:610
#26 0x00007f17ea492b2c in TUnixSystem::CheckDescriptors (this=this
entry=0x1fc64f0) at /home/chinmay/ROOT/root-6.08.04/core/unix/src/TUnixSystem.cxx:1321
#27 0x00007f17ea493dca in TUnixSystem::DispatchOneEvent (this=0x1fc64f0, pendingOnly=<optimized out>) at /home/chinmay/ROOT/root-6.08.04/core/unix/src/TUnixSystem.cxx:1076
#28 0x00007f17ea3e3ba4 in TSystem::InnerLoop (this=0x1fc64f0) at /home/chinmay/ROOT/root-6.08.04/core/base/src/TSystem.cxx:408
#29 0x00007f17ea3e27af in TSystem::Run (this=0x1fc64f0) at /home/chinmay/ROOT/root-6.08.04/core/base/src/TSystem.cxx:358
#30 0x00007f17ea3855bf in TApplication::Run (this=this
entry=0x20107d0, retrn=retrn
entry=false) at /home/chinmay/ROOT/root-6.08.04/core/base/src/TApplication.cxx:1153
#31 0x00007f17ea7afff7 in TRint::Run (this=this
entry=0x20107d0, retrn=retrn
entry=false) at /home/chinmay/ROOT/root-6.08.04/core/rint/src/TRint.cxx:463
#32 0x000000000040113c in main (argc=1, argv=0x7ffecb0fceb8) at /home/chinmay/ROOT/root-6.08.04/main/src/rmain.cxx:30
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum.
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x00007f17d2c968af in TMVA::MethodBase::MethodBaseDir (this=this
entry=0x3e153b0) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/MethodBase.cxx:1981
#7  0x00007f17d2c97326 in TMVA::MethodBase::BaseDir (this=this
entry=0x3e153b0) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/MethodBase.cxx:1939
#8  0x00007f17d2ca2c59 in TMVA::MethodBase::TrainMethod (this=this
entry=0x3e153b0) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/MethodBase.cxx:653
#9  0x00007f17d2c501ac in TMVA::Factory::TrainAllMethods (this=this
entry=0x3401960) at /home/chinmay/ROOT/root-6.08.04/tmva/tmva/src/Factory.cxx:879
#10 0x00007f17d5d4d584 in MMLTrainer::TrainAllMethods (this=0x33c4910) at /home/chinmay/MaceSIM-PROOF-DEV/MaceSIM/src/SimManager/MSimTMVAInterface.cxx:749
#11 0x00007f17eaa00070 in ?? ()
#12 0x00000001eaa00000 in ?? ()
#13 0x00000000020785d0 in ?? ()
#14 0x00007f17e62f1220 in ?? () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#15 0x00007ffecb0fa680 in ?? ()
#16 0x00007f17eaa00000 in ?? ()
#17 0x00007f17e6291c46 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#18 0x00007f17e62934bd in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#19 0x00007f17e6293738 in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#20 0x00007f17e6316c77 in cling::MetaProcessor::process(char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /home/chinmay/ROOT/root-6.08.04/INSTALLATION/lib/libCling.so
#21 0x00007f17e6205a66 in HandleInterpreterException (metaProcessor=<optimized out>, input_line=<optimized out>, compRes=
0x7ffecb0fa55c: cling::Interpreter::kSuccess, result=result
entry=0x7ffecb0fa680) at /home/chinmay/ROOT/root-6.08.04/core/meta/src/TCling.cxx:1882
===========================================================


Root >  _

It seems Factory is not able to create proper method-subdirectory inside provided output roor file for some
reason / or it is unable to change to the output base directory.
Please help.

MSimTMVAInterface.h (8.2 KB)
MSimTMVAInterface.cxx (22.2 KB)
tmva_reg_example.root (110.5 KB)
Short reproducer class of above error with input .root file. Do following steps to reproduce error :

[chinmay@localhost rootforum]$ root -l
root [1] .L MSimTMVAInterface.cxx+ 
root [2] MMLTrainer mlt
(MMLTrainer &) Name:  Title: 
root [3] TFile *fIn = new TFile("tmva_reg_example.root","READ")
(TFile *) 0x3789a10
root [4] TTree *regTree = (TTree*)fIn->Get("TreeR")
(TTree *) 0x441b5e0
root [5] mlt.InitFactory("TestML","/home/chinmay/ROOT/root-6.08.04/INSTALLATION/tutorials/tmva/rootforum/MML.root","!V:!Silent:Color:DrawProgressBar:AnalysisType=Regression")
(Int_t) 0
root [6] mlt.AddRegressionTree(regTree,1.0)
DataSetInfo              : [/home/chinmay/ROOT/root-6.08.04/INSTALLATION/tutorials/tmva/rootforum/TestML_cat0001] : Added class "Regression"
                         : Add Tree TreeR of type Regression with 10000 events
root [7] mlt.AddTrainingVariable("var1","Variable 1","units",'F')
root [8] mlt.AddTrainingVariable("var2","Variable 2","units",'F')
root [9] mlt.AddTarget("fvalue")
root [10] mlt.PrepareTrainingAndTestTreeReg("","nTrain_Regression=1000:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V")
Info in <MMLTrainer::PrepareTrainingAndTestTreeReg>: dataloader 0 cut = 

                         : Dataset[/home/chinmay/ROOT/root-6.08.04/INSTALLATION/tutorials/tmva/rootforum/TestML_cat0001] : Class index : 0  name : Regression
root [11] mlt.BookMethod(TMVA::Types::kMLP, "MLP", "!H:!V:VarTransform=Norm:NeuronType=tanh:NCycles=20000:HiddenLayers=N+20:TestRate=6:TrainingMethod=BFGS:Sampling=0.3:SamplingEpoch=0.8:ConvergenceImprove=1e-6:ConvergenceTests=15:!UseRegulator")
Factory                  : Booking method: MLP
                         : 
MLP                      : [/home/chinmay/ROOT/root-6.08.04/INSTALLATION/tutorials/tmva/rootforum/TestML_cat0001] : Create Transformation "Norm" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : target 'fvalue' <---> Output : target 'fvalue'
MLP                      : Building Network. 
                         : Initializing weights
root [12] mlt.TrainAllMethods()

IMHO the directory handling in TMVA is a mess, especially since the change to DataLoader was made. My experience is that you will run into problems if you donā€™t work in the current directory.

I recommend:

  1. save current working directory
  • chdir to the desired output location
  • if you are dealing with relative file names, canonicalize them before chdirā€™ing
  • then just use simple names (only containing alphanumeric characters) for your DataSet, i.e. donā€™t use a filename
  • run the training
  • chdir back to the original working dir

Thanks for the reply.
Yeah as you said, I can see tutorials not working properly when some ā€˜filenameā€™ is used instead of ā€˜datasetā€™.
I can take the approach of changing the working directory to the desired output location.
However I have one more doubt. Can I create multiple Factory objects (without running into problems) simultaneously ?
I wanted to implement category classification and regression using TMVA in my application. For that purpose I am
simply creating multiple Factory/DataLoader/Reader objects, each one handling one category data. Would that work fine ? After facing above problem, I have started to get the feeling that TMVA seems to be built for ā€œAll training through single Factory object onlyā€. e.g. It seems I can not set different weight directories for different Factory objects and the switch for setting weightdir is global.

The idea is to use one factory, but you can have multiple DataLoaders.
The factory is using global state (variable fgTargetFile), so donā€™t use more than one factory.

The reader is a bit independent, and you can have multiple readers as well.

1 Like