TMVA crashes after rectangular cut optimization with cross-validation

TMVA crashes with a segmentation violation after doing a rectangular cut optimization with cross-validation in 6.22/06. No such crash occurs for an RCO without cross-validation or for a BDT with cross-validation. It appears to happen during post-processing and to have something to do with saving the ROC integral.

A workaround would be very much appreciated. Reproducer:


Setup:

import ROOT as r

r.RDataFrame(1000).Define("v", "gRandom->Gaus(5, 5)").Define("e", "rdfentry_").Snapshot(
    "atree", "sig.root"
)
r.RDataFrame(1000).Define("v", "gRandom->Gaus(1, 3)").Define("e", "rdfentry_").Snapshot(
    "atree", "bkg.root"
)
fsig = r.TFile.Open("sig.root")
tsig = fsig.atree
fbkg = r.TFile.Open("bkg.root")
tbkg = fbkg.atree
fout = r.TFile.Open("out.root", "recreate")

dl = r.TMVA.DataLoader("dataset")
dl.AddVariable("v", "some distribution", "", "F")
dl.AddSpectator("e", "entry number", "")
dl.AddSignalTree(tsig)
dl.AddBackgroundTree(tbkg)

Doing cross-validation fails:

dl.PrepareTrainingAndTestTree(
    "",
    "",
    (
        r"nTest_Signal=1:nTest_Background=1:NormMode=NumEvents:!V:SplitSeed=100:"
        r"SplitMode=Random"
    ),
)
cv = r.TMVA.CrossValidation(
    "TMVACrossValidation",
    dl,
    fout,
    r"!V:!Silent:AnalysisType=Classification"
    r":FoldFileOutput=True:SplitType=Deterministic:NumFolds=10"
    r":SplitExpr=int([e])%int([NumFolds])",
)
cv.BookMethod(
    r.TMVA.Types.kCuts,
    "GeneticAlgorithm",
    r"!H:!V:FitMethod=GA:EffMethod=EffSel:VarProp=NotEnforced",
)
cv.Evaluate()

Output:

Factory                  : You are running ROOT Version: 6.22/06, Nov 27, 2020
                         : 
                         : _/_/_/_/_/ _|      _|  _|      _|    _|_|   
                         :    _/      _|_|  _|_|  _|      _|  _|    _| 
                         :   _/       _|  _|  _|  _|      _|  _|_|_|_| 
                         :  _/        _|      _|    _|  _|    _|    _| 
                         : _/         _|      _|      _|      _|    _| 
                         : 
                         : ___________TMVA Version 4.2.1, Feb 5, 2015
                         : 
                         : Building event vectors for type 2 Signal
                         : Dataset[dataset] :  create input formulas for tree atree
                         : Building event vectors for type 2 Background
                         : Dataset[dataset] :  create input formulas for tree atree
DataSetFactory           : [dataset] : Number of events in input trees
                         : 
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Signal     -- training events            : 999
                         : Signal     -- testing events             : 1
                         : Signal     -- training and testing events: 1000
                         : Background -- training events            : 999
                         : Background -- testing events             : 1
                         : Background -- training and testing events: 1000
                         : 
DataSetInfo              : Correlation matrix (Signal):
                         : ----------------
                         :                v
                         :       v:  +1.000
                         : ----------------
DataSetInfo              : Correlation matrix (Background):
                         : ----------------
                         :                v
                         :       v:  +1.000
                         : ----------------
DataSetFactory           : [dataset] :  
                         : 
                         : 
                         : 
                         : ========================================
                         : Processing folds for method GeneticAlgorithm
                         : ========================================
                         : 
                         : Creating fold output at:./GeneticAlgorithm_fold1.root
Factory                  : Booking method: GeneticAlgorithm_fold1
                         : 
                         : Use optimization method: "Genetic Algorithm"
                         : Use efficiency computation method: "Event Selection"
FitterBase               : <GeneticFitter> Optimisation, please be patient ... (inaccurate progress timing for GA)
                         : Elapsed time: 1.19 sec                            
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.1
                         : Corresponding background efficiency       : 0
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:    11.8051 < v <=    30.4157
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.2
                         : Corresponding background efficiency       : 0.00333704
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:    9.49354 < v <=    19.4138
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.3
                         : Corresponding background efficiency       : 0.0155729
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:     7.8027 < v <=    21.9749
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.4
                         : Corresponding background efficiency       : 0.0355951
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:    6.28893 < v <=    31.2436
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.5
                         : Corresponding background efficiency       : 0.0945495
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:    4.90878 < v <=     36.037
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.6
                         : Corresponding background efficiency       : 0.185762
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:    3.73397 < v <=    33.9246
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.7
                         : Corresponding background efficiency       : 0.310345
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:    2.60902 < v <=    37.8259
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.8
                         : Corresponding background efficiency       : 0.546162
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:   0.805316 < v <=    34.4058
                         : -------------------------------------
                         : -------------------------------------
GeneticAlgorithm_fold1   : Cut values for requested signal efficiency: 0.9
                         : Corresponding background efficiency       : 0.835373
                         : Transformation applied to input variables : None
                         : -------------------------------------
                         : Cut[ 0]:   -1.77597 < v <=    30.5041
                         : -------------------------------------
                         : Elapsed time for training with 1798 events: 1.2 sec         
GeneticAlgorithm_fold1   : [dataset] : Evaluation of GeneticAlgorithm_fold1 on training sample (1798 events)
                         : Elapsed time for evaluation of 1798 events: 0.000247 sec       
                         : Creating xml weight file: dataset/weights/TMVACrossValidation_GeneticAlgorithm_fold1.weights.xml
                         : Creating standalone class: dataset/weights/TMVACrossValidation_GeneticAlgorithm_fold1.class.C
                         : ./GeneticAlgorithm_fold1.root:/dataset/Method_Cuts/GeneticAlgorithm_fold1
Factory                  : Test all methods
Factory                  : Test method: GeneticAlgorithm_fold1 for Classification performance
                         : 
GeneticAlgorithm_fold1   : [dataset] : Evaluation of GeneticAlgorithm_fold1 on testing sample (200 events)
                         : Elapsed time for evaluation of 200 events: 2.29e-05 sec       
Factory                  : Evaluate all methods
Factory                  : Evaluate classifier: GeneticAlgorithm_fold1
                         : 
<WARNING>                : You have asked for histogram MVA_EFF_BvsS which does not seem to exist in *Results* .. better don't use it 
<WARNING>                : You have asked for histogram EFF_BVSS_TR which does not seem to exist in *Results* .. better don't use it 
TFHandler_GeneticAlgor...: Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :        v:     2.9859     4.3313   [    -6.5176     15.305 ]
                         : -----------------------------------------------------------
                         : 
                         : Evaluation results ranked by best signal efficiency and purity (area)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet       MVA                       
                         : Name:         Method:          ROC-integ
                         : dataset       GeneticAlgorithm_fold1: 0.789
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
                         : Testing efficiency compared to training efficiency (overtraining check)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet              MVA              Signal efficiency: from test sample (from training sample) 
                         : Name:                Method:          @B=0.01             @B=0.10            @B=0.30   
                         : -------------------------------------------------------------------------------------------------------------------
                         : dataset              GeneticAlgorithm_fold1: 1.000 (0.290)       0.570 (0.512)      0.745 (0.696)
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
Dataset:dataset          : Created tree 'TestTree' with 200 events
                         : 
Dataset:dataset          : Created tree 'TrainTree' with 1798 events
                         : 
Factory                  : Thank you for using TMVA!
                         : For citation information, please visit: http://tmva.sf.net/citeTMVA.html
 *** Break *** segmentation violation
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libCore.6.22.06.so] TUnixSystem::DispatchSignals(ESignals) (no debug info)
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[<unknown binary>] (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libTMVA.6.22.06.so] TMVA::Factory::GetROC(TString, TString, unsigned int, TMVA::Types::ETreeType) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libTMVA.6.22.06.so] TMVA::Factory::GetROCIntegral(TString, TString, unsigned int) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libTMVA.6.22.06.so] TMVA::CrossValidation::ProcessFold(unsigned int, TMVA::OptionMap const&) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libTMVA.6.22.06.so] TMVA::CrossValidation::Evaluate() (no debug info)
[<unknown binary>] (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libcppyy_backend3_9.6.22.06.so] WrapperCall(long, unsigned long, void*, void*, void*) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libcppyy3_9.6.22.06.so] CPyCppyy::(anonymous namespace)::VoidExecutor::Execute(long, void*, CPyCppyy::CallContext*) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libcppyy3_9.6.22.06.so] CPyCppyy::CPPMethod::ExecuteFast(void*, long, CPyCppyy::CallContext*) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libcppyy3_9.6.22.06.so] CPyCppyy::CPPMethod::ExecuteProtected(void*, long, CPyCppyy::CallContext*) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libcppyy3_9.6.22.06.so] CPyCppyy::CPPMethod::Call(CPyCppyy::CPPInstance*&, _object*, _object*, CPyCppyy::CallContext*) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/lib/libcppyy3_9.6.22.06.so] CPyCppyy::(anonymous namespace)::mp_call(CPyCppyy::CPPOverload*, _object*, _object*) (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyObject_MakeTpCall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] builtin_exec (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] cfunction_vectorcall_FASTCALL (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] gen_send_ex (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] gen_send_ex (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] gen_send_ex (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] method_vectorcall_O (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] method_vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] method_vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] PyVectorcall_Call (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyFunction_Vectorcall (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] call_function (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalFrameDefault (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] _PyEval_EvalCode (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] PyRun_FileExFlags (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] PyRun_SimpleFileExFlags (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] pymain_run_file (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] pymain_run_python (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] Py_RunMain (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] pymain_main (no debug info)
[/Users/michael/opt/anaconda3/envs/aslsRun2_ana/bin/python] main (no debug info)
[/usr/lib/system/libdyld.dylib] start (no debug info)
---------------------------------------------------------------------------
SegmentationViolation                     Traceback (most recent call last)
<ipython-input-2-861830e08ce3> in <module>
     20     r"!H:!V:FitMethod=GA:EffMethod=EffSel:VarProp=NotEnforced",
     21 )
---> 22 cv.Evaluate()

SegmentationViolation: void TMVA::CrossValidation::Evaluate() =>
    SegmentationViolation: segfault in C++; program state was reset

Using a factory works just fine:

dl.PrepareTrainingAndTestTree(
    "",
    "",
    (
        r"nTest_Signal=0:nTest_Background=0:NormMode=NumEvents:!V:SplitSeed=100:"
        r"SplitMode=Random"
    ),
)
fact = r.TMVA.Factory(
    "TMVAClassification",
    fout,
    r"!V:!Silent:AnalysisType=Classification",
)
fact.BookMethod(
    dl,
    r.TMVA.Types.kCuts,
    "GeneticAlgorithm",
    r"!H:!V:FitMethod=GA:EffMethod=EffSel:VarProp=NotEnforced",
)
fact.TrainAllMethods()
fact.TestAllMethods()
fact.EvaluateAllMethods()

@swunsch @moneta can you please take a look? Thanks!

@moneta ping

Hi,
Sorry for my late reply. I cannot reproduce the crash on the master (which should be the same as 6.24), but I can reproduce in 6.22.
If you cannot upgrade to the new ROOT version, which will be released in the next days, please let me know and I can see if I can back propagate the fix in the 6.22 branch

Cheers

Lorenzo

I can try again after 6.24 is released.

(6.24 is released :tada: )

Sorry for the long wait. I can confirm that the problem disappears for 6.24/00. Thanks!

1 Like