Home | News | Documentation | Download

GetSeparation error in evaluation stage of training

Hello,
I’m training a BDT on some large nTuples and I have a stubborn error in the EvaluateAllMethods stage. It reads:
<GetSeparation> signal and background histograms have different or invalid dimensions
and I suspect it has something to do with the calculated RMS of a certain variable going to -nan in the table of variables (line 433 in the included log nohup.txt). This value seems to be reasonable in the other tables that TFHandler_BDT generates.
The separation on this variable isn’t great, but I would like to include it in the analysis if possible. Any idea what might be causing this error?

Thanks,
Gannon

Files:
Command log: nohup.txt (56.6 KB)
Macro: performAnalysis.py (4.9 KB)
(Let me know if you want to see the dependencies for the macro or the ntuples themselves. They’re fairly large so I’m not sure how to share them here.)

@jonas Is this something you can help with?

Hi, I’m not the TMVA expert here, just RooFit :slight_smile:
I think @moneta is the main expert here.

Hi,
I think this is caused by some BDT output values being a NaN for some events, so a check in Tools.cxx:152 fails and ends up in a FATAL.
I think the issue should be solved if one understand why you are getting NaN values. Are you sure all inputs are OK and not NaN.
I would need access to your input files to understand more about the problem

Lorenzo

Hi Lorenzo,
I’ve checked all the events and none of the inputs are NaN. I’m going to try running the BDT on the input tree in segments to see if I can find which event(s) lead to a NaN BDT output. If you want to take a look at the input files, I can share them. What’s the easiest way to transfer them?
Gannon

Hi,
You can share via cernbox for example. Please also share your C++ code running the BDT.

Cheers

Lorenzo

Hi,
Here’s a link to the files I was analyzing (WW as signal, ttbar as background.) I am using a set of python files instead of C++ to run the BDT, which I included. The performAnalysis.py file in run is what i actually run, while src contains middleware used to load the data files, declare variables etc.
Thanks,
Gannon