Segmentation violation error when call TMVA::MethodBase::GetMaximumSignificance method

p73 · November 23, 2017, 6:31pm

I am using TMVA Rectangular Cuts method.
After successful training I want to get cuts of maximum significance, so I use:

    reader->BookMVA( "Cuts", "./dataset/weights/EtotPtotClf_Cuts.weights.xml" );
    TMVA::MethodCuts* methodCuts = reader->FindCutsMVA( "Cuts" );
    //Get Maximum Significance cuts
    Double_t cuts;
    methodCuts->TMVA::MethodBase::GetMaximumSignificance( 1000, 1000, cuts );

After the last line I get segmentation violation. Without it everything is OK. Maybe this would be helpful.

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f602399895c in __libc_waitpid (pid=27752, stat_loc=stat_loc
entry=0x7fff37dc2c40, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1  0x00007f602391a232 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f60248a6d83 in TUnixSystem::StackTrace() () from /home/wolfgang/root-6.08.02/lib/libCore.so
#3  0x00007f60248a978c in TUnixSystem::DispatchSignals(ESignals) () from /home/wolfgang/root-6.08.02/lib/libCore.so
#4  <signal handler called>
#5  0x00007f600e3e8d34 in TMVA::Results::GetObject(TString const&) const () from /home/wolfgang/root-6.08.02/lib/libTMVA.so
#6  0x00007f600e3e90c3 in TMVA::Results::GetHist(TString const&) const () from /home/wolfgang/root-6.08.02/lib/libTMVA.so
#7  0x00007f600e2f83ae in TMVA::MethodBase::GetMaximumSignificance(double, double, double&) const () from /home/wolfgang/root-6.08.02/lib/libTMVA.so
#8  0x00007f6024dd17da in ?? ()
#9  0x0000000000000000 in ?? ()
===========================================================

UPDATE: I noticed these lines on the training output:

: You have asked for histogram MVA_EFF_BvsS which does not seem to exist in Results … better don’t use it
: You have asked for histogram EFF_BVSS_TR which does not seem to exist in Results … better don’t use it

kialbert · November 24, 2017, 1:11pm

Seems the method is not working in the Application phase because it requires a full results set which is not created during application.

If possible, you could try using your application data as the test set during training.

Otherwise, the only recourse I see is to do the manual calculation. You can get the ROC plot after training and extract efficiencies from there and then use plug these into the formula

significance = sqrt(num_sig)*( effS )/sqrt( effS + (num_bkg / num_sig) * effB  );

.

Cheers,
Kim

p73 · November 24, 2017, 2:05pm

Thank you for the reply.
Two questions here:

How to create that results set you said about?
How to get ROC parameters after training?

kialbert · November 24, 2017, 4:34pm

Ah, sorry for my brevity. The results set is created during the training and testing phase of TMVA so when you run:

{
    // Here the results set for training data is created
    factory->TrainAllMethods();

    // Here for the test data set
    factory->TestAllMethods();
}

The ROC datapoints can be retrieved with

TGraph* g = factory->GetROCCurve(dataloader, methodName, false, iClass);

Double_t *x = g->GetX(); // Signal efficiency
Double_t *y = g->GetY(); // Background rejection
Int_t     n = g->GetN(); // Length of arrays

after training and testing.

Cheers,
Kim

p73 · November 24, 2017, 4:51pm

I get the segmentation violation after the try to call GetROCCurve member function.

    factory->BookMethod( dataloader, TMVA::Types::kCuts, "Cuts", "FitMethod=MC:EffMethod=EffSel:CutRangeMin=0:CutRangeMax=2500" );
    //Train methods
    factory->TrainAllMethods();
    //Test methods
    factory->TestAllMethods();
    //Evaluate methods
    factory->EvaluateAllMethods();
    //Get ROC curve
    TGraph* roc = factory->GetROCCurve( dataloader, "Cuts", false );

p73 · November 24, 2017, 6:56pm

How to make all that stuff work? I mean Is it possible to normally call all that methods ( such as GetMaximumSignificance, GetROCCurve and so on ).
You see the picture. User can train and test great classification and regression methods ( thank you so much TMVA team ). User can see great graphs containing some very important dependencies (like significance on cut). Good job. User sees the maximum significance point and thinks “oh, great I could use this point as the figure of merit in my analysis.” But. User does not know that this is not an easy task to get things that seem available.

kialbert · November 28, 2017, 9:56pm

I know, I am struggling with the same issues. We of the dev team are always working to improve the workflow, so you feedback is definitely appreciated.

Cheers,
Kim

kialbert · November 29, 2017, 4:54pm

I have now looked deeper into the problem and there is a bug at the root of it. The “Cuts” method behaves differently than other methods when it comes to how the ROC is calculated and the generalised ROC Curve calculations can’t handle it for the time being.

What I have gathered from the last time is that the conventional approach is to use TMVAGui on your test data to calculated the optimal cut and then use this cut for further analysis/application.

This should be working also for

methodCuts->TMVA::MethodBase::GetMaximumSignificance( 1000, 1000, cuts );

if you want to do it programmatically.

Cheers,
Kim

system · December 13, 2017, 4:55pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.