Nan in TTree

Hi,
I have a TTree which I suspect to have a nan as one of the values somewhere. I have looked in a TBrowser but nothing goes wrong, and I have tried to do a scan of the file but It will not accept nan as a numerical value. How do I find out?

Cheers,
Toby Davies

Hi Toby,

this might help:double d=log(-1.); if (!(d>0. || d<=0.)) printf("nan!\n"); or, less of a hack: if (TMath::IsNaN(d)) printf("nan!\n");

Cheers, Axel.

Hi,
Thanks for the reply - I know that the test:
(x!=x)
will tell you whether or not x is a nan, but I was looking for a way to qurey the root file without having to go through every event and every variable, a bit like a scan. Is there any good way to do this? Do TTrees even store nans? (I know they get messed up if you pass them a double when it expected a float, for instance - how would it cope with infinity?)

Cheers,
Toby

this was answered in another post.

Hi,

nan should be stored; it’s just a peculiar double value, but it is a value. I don’t know of a way to find a certain value in a whole tree without going through all values and all entries, and I doubt it exists.

I assume that you are trying to debug you TMultiLayerPerceptron problem. What you can do is attach a debugger, and use TSystem::SetFPEMask(kAllMask), so you get an exception (which is then caught by the debugger) when TMLP sees a nan.

Cheers, Axel.

How do I attatch the debugger? I have valgrind, for example, should I just run that in the normal way on root?

Hi Toby,

valgrind is not a debugger, gdb is. See e.g. muenster.de/~naumana/rootgdb.html

Cheers, Axel.

I have gdb running now, but when I attatch it to ROOT, ROOT will not recieve an input from the keyboard, so I can’t recreate my results. How do I fix this, or is there any other way to find out what is wrong with TMLP?

Cheers,
Toby

Hi,

see the web page I posted above. You probably forgot to type “cont” at gdb’s prompt, to tell it to let root continue.

Cheers, Axel.

Yes, I remember reading that… how stupid… anyway, when I run root with the command:

gSystem->SetFPEMask(kAllMask)

it crashes after

TTree * tr=(TTree*)gDirectory->Get(“TDDescVar”)

with the segmentation fault:

*** Break *** floating point exception
Generating stack trace…
0xb7035219 in TKey::ReadFile() at base/src/TKey.cxx:877 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libCore.so
0xb7034287 in TKey::ReadObj() at base/src/TKey.cxx:583 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libCore.so

(which doesn’t happen if I do not do gSystem->SetFPEMask(kAllMask)). gdb says at this point:

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread -1241382240 (LWP 6760)]
TFile::ReadBuffer (this=0x8e5b6d8, buf=0xb520e008 “”, len=2729678)
at base/src/TFile.cxx:1128
1128 base/src/TFile.cxx: No such file or directory.
in base/src/TFile.cxx

Which I do not know how to interpret (the root file is definitely there at least.) Is the gSystem command correct? Are there some files missing from the release?

The NN bails out with no gdb error messages if the gSystem command is not called.

Cheers,
Toby

Hi,

yes, gdb will run happily without SetFPEMask - that’s the point of calling SetFPEMask :wink: The exception you see is either because of some problem with TFile::ReadBuffer, or because it reads a value which is nan. If you type “bt” or “backtrace” at the gdb prompt you should see something like TTree::GetEntry - its argument is the entry (or event) number that ROOT is currently trying to load. That would at least tell you which entry the nan is in.

Gdb says that it cannot find base/src/TFile.cxx because you installed a binary distribution - if you’d have built from sources it could show you the source line where things go wrong. That’s not that important. The back trace might already tell you which entry==event the nan occurs, and you might even be able to spot the branch and leaf name that contains it.

If you don’t see any TTree related functions in the backtrace you can just continue - maybe this is just the first floating point exception, and the nan one is still to come.

Cheers, Axel.

Well, it seems as though any command which follows SetFPEMask causes a segmentation fault - here you can see it after a TBrowser is called. So this error cannot be related to the TMLP one.

[tdavies@nglas16 results]$ root -l
root [0] gSystem->SetFPEMask(kAllMask)
(Int_t)0
root [1] TBrowser T

*** Break *** floating point exception
Generating stack trace…
0xb700b83d in TBrowser::TBrowser[in-charge](char const*, char const*) at base/src/TBrowser.cxx:94 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libCore.so

The SetFPEMask function causes a segmentation fault that stops any code after running.

without using SetFPEMask, I have set a breakpoint at TMultiLayerPerceptron::GetError(), as this call happens just before the error message, and I have no clue how to read the following:

#0 TMultiLayerPerceptron::GetError (this=0x9049e30, set=TMultiLayerPerceptron::kTest)
at mlp/src/TMultiLayerPerceptron.cxx:926
#1 0xb591b006 in TMultiLayerPerceptron::Train (this=0x9049e30, nEpoch=10,
option=0x95afdfc “text, graph, update=1”) at mlp/src/TMultiLayerPerceptron.cxx:846
#2 0xb59291d5 in G__G__MLP_146_3_1 ()
from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
#3 0xb6ccd9b5 in G__call_cppfunc (result7=0xbffe9840, libp=0xbffde000, ifunc=0x9555f28, ifn=3)
at cint/src/newlink.c:523
#4 0xb6cb7e1b in G__interpret_func (result7=0xbffe9840, funcname=0xbffe8e70 “Train”,
libp=0xbffde000, hash=510, p_ifunc=0x9555f28, funcmatch=1, memfunc_flag=1)
at cint/src/ifunc.c:6600
#5 0xb6c8e25f in G__getfunction (item=0xbffea445 “Train(10,“text, graph, update=1”)”,
known3=0xbffed54c, memfunc_flag=1) at cint/src/func.c:3194
#6 0xb6d53036 in G__getstructmem (store_var_type=112, varname=0xbffed180 “\200q(¶\020¿.\b¤Ñþ¿”,
membername=0xbffea445 “Train(10,“text, graph, update=1”)”, tagname=0xbffea440 “myNN”,
known2=0xbffed54c, varglobal=0xb6e82fc0, objptr=1) at cint/src/var.c:4899
#7 0xb6d488bf in G__getvariable (item=0xbffedc90 “myNN.Train(10,“text, graph, update=1”)”,
known2=0xbffed54c, varglobal=0xb6e82fc0, varlocal=0x0) at cint/src/var.c:3552
#8 0xb6c81d0e in G__getitem (item=0xbffedc90 “myNN.Train(10,“text, graph, update=1”)”)
at cint/src/expr.c:2342
#9 0xb6c7fe6a in G__getexpr (expression=0xbffef170 “myNN.Train(10,“text, graph, update=1”)”)
at cint/src/expr.c:1711
#10 0xb6ce9f85 in G__exec_function (
statement=0xbffef170 “myNN.Train(10,“text, graph, update=1”)”, pc=0xbffef16c,
piout=0xbffef160, plargestep=0xbffef150, presult=0xbffefb40) at cint/src/parse.c:521
#11 0xb6cf4ae2 in G__exec_statement () at cint/src/parse.c:4869
#12 0xb6c5f851 in G__exec_tempfile_core (file=0x0, fp=0x95afc88) at cint/src/debug.c:390
#13 0xb6c5fb0e in G__exec_tempfile_fp (fp=0x95afc88) at cint/src/debug.c:476
#14 0xb6cffefa in G__process_cmd (line=0x95afc54 “myNN.Train(10,“text, graph, update=1”);”,
prompt=0x844ea44 “”, more=0x844ea3c, err=0xbfff9c1c, rslt=0xbfff9c20) at cint/src/pause.c:4110
#15 0xb70b8afa in TCint::ProcessLine (this=0x844ea20,
line=0x95afc54 “myNN.Train(10,“text, graph, update=1”);”, error=0x0) at meta/src/TCint.cxx:310
#16 0xb7002d3d in TApplication::ProcessLine (this=0x8799f30,
line=0x95afc54 “myNN.Train(10,“text, graph, update=1”);”, sync=false, err=0x0)
at base/src/TApplication.cxx:691
#17 0xb62dc572 in TRint::HandleTermInput (this=0x8799f30) at rint/src/TRint.cxx:397
#18 0xb62dad8f in TTermInputHandler::Notify (this=0x8e397e8) at rint/src/TRint.cxx:102
#19 0xb62dcf6f in TTermInputHandler::ReadNotify (this=0x8e397e8) at rint/src/TRint.cxx:96
#20 0xb7136e46 in TUnixSystem::CheckDescriptors (this=0x8446778) at unix/src/TUnixSystem.cxx:890
#21 0xb7136681 in TUnixSystem::DispatchOneEvent (this=0x8446778, pendingOnly=false)
at unix/src/TUnixSystem.cxx:693
#22 0xb7077e6b in TSystem::InnerLoop (this=0x8446778) at base/src/TSystem.cxx:316
#23 0xb7077df6 in TSystem::Run (this=0x8446778) at base/src/TSystem.cxx:284
#24 0xb700361b in TApplication::Run (this=0x8799f30, retrn=false) at base/src/TApplication.cxx:805
#25 0xb62dbf13 in TRint::Run (this=0x8799f30, retrn=false) at rint/src/TRint.cxx:263
#26 0x08048d3b in main (argc=1, argv=0xbfffa314) at main/src/rmain.cxx:29
(gdb)

however, since this error message is generated when it finds a nan error, presumably the problem has happened somewhere down the line and has not produced any error messages unitl now. I don’t know enough about the program to choose a function to set as a break point to find the problem at it’s root. which function should I be choosing a different function for a break point? When it breaks what do I look for in the output?

Cheers,
Toby Davies

Hi,
setting the FPE mask to “all” is too much, you’re right - it even throws an exception if the value is “inexact”. Try setting it to kDefaultMask. It’s still the best way to figure out where and why you get the nan.
Cheers, Axel.

Right,
this works -kinda.
root thows something, and gdb sees:

(gdb) bt full
#0 0xb5923a8c in TNeuron::GetValue (this=0x9565e80) at mlp/src/TNeuron.cxx:859
branch = -1
nentries = 0
#1 0xb59243d1 in TSynapse::GetValue (this=0x95667e0) at mlp/src/TSynapse.cxx:72
No locals.
#2 0xb5923ad5 in TNeuron::GetValue (this=0x9566600) at mlp/src/TNeuron.cxx:864
preSynapse = (struct TSynapse *) 0x95667e0
i = 4
input = 0.76636425180433199
value = 1.0966530835157594e-314
nentries = 10
#3 0xb59243d1 in TSynapse::GetValue (this=0x95685b0) at mlp/src/TSynapse.cxx:72
No locals.
#4 0xb5923ad5 in TNeuron::GetValue (this=0x9568470) at mlp/src/TNeuron.cxx:864
preSynapse = (struct TSynapse *) 0x95685b0
i = 0
input = -0.024220650317147374
value = -1.3748538263275518e-50
nentries = 10
#5 0xb5923cfd in TNeuron::GetError (this=0x9568470) at mlp/src/TNeuron.cxx:939
No locals.
#6 0xb5923d83 in TNeuron::GetDeDw (this=0x9568470) at mlp/src/TNeuron.cxx:950
nentries = 1
#7 0xb5923de8 in TNeuron::GetDeDw (this=0x9566600) at mlp/src/TNeuron.cxx:954
postSynapse = (struct TSynapse *) 0x95685b0
i = 0
nentries = 1
#8 0xb5923de8 in TNeuron::GetDeDw (this=0x9565b30) at mlp/src/TNeuron.cxx:954
postSynapse = (struct TSynapse *) 0x9566740
i = 0
nentries = 10
#9 0xb591fd32 in TMultiLayerPerceptron::MLP_Stochastic (this=0x904d348, buffer=0x962cda8)
at mlp/src/TMultiLayerPerceptron.cxx:1563
cnt = 439
nEvents = 46302
i = 0
nentries = 10
synapse = (struct TSynapse *) 0xb590eb01
index = (Int_t *) 0xb4781008
j = 0

looking in TNeuron::GetValue(), to me it seems as if the program has reached the following lines of code:

Int_t nentries = fpre.GetEntriesFast();
if (!nentries) {
Double_t branch = GetBranch();
return (((TNeuron*)this)->fValue = (branch - fNorm[1]) / fNorm[0]);

I do not know why ther would be no entries in fpre - is this normal, or sould it be a cuase of the problem?
GetBranch calls fFormula->EvalInstance(); and has obviously got a -1 back. looking at the EvalInstance code is not too explanatory as to what this means - what does it mean, and is it a potential cause of the problem?
the only other thing is the division by fNorm[0], which would certainly messthings up if it were 0. using ‘bt full’ will only give the local variables to the functions in the stack - how do I get gdb to show me this information?

Cheers,
Toby Davies

I looked for SetNormalisation as a breakpoint, and it set fNorm[0] to 1; there were no more breaks before the program crashed, so I do not think fNorm[0] is the problem; (is there any way to know for sure?)

Cheers,
Toby

I spoke too soon - there is a function, TNeuron::UseBranch(TTree* input, const char* formula) which does:

TH1D tmp(“tmpb”, “tmpb”, 1, -FLT_MAX, FLT_MAX);
input->Draw(Form("%s>>tmpb",formula),"",“goff”);
fNorm[0] = tmp.GetRMS();

and GetRMS();
does:

rms2 = TMath::Abs(stats[axm+1]/stats[0] -x*x);
return TMath::Sqrt(rms2);

putting a break at GetRMS(); gives:

(gdb) bt full
#0 TH1::GetRMS (this=0xbffd4890, axis=1) at hist/src/TH1.cxx:4755
rms2 = -5.5737980661532215e-42
stats = {-1.1744961477445355e-45, -5.6760270760144201e-42, 7.6972214557273979e-266, -5.574408397319857e-42,
2.1956861168202626e-314, 2.1219957909652728e-308, -4.267776856812761e-42, 3.236397096371005e-265, -1.3717257616706172e-50,
-1.1218762251854436e-50, -1.3706776356061717e-50}
x = 1.1666764724510648e-263
ax = {24, 0, 75145097}
axm = 14603089
#1 0xb5922934 in TNeuron::UseBranch (this=0x904df78, input=0x8f63678, formula=0x913c38c “LepAEn”) at mlp/src/TNeuron.cxx:831
tmp = {…

and so GetRMS is trying to take sqrt(-5.5737980661532215e-42)=nan as its output.

I removed the brance from the neural net and ran again, but it seems that the first call to GetRMS always has rms2 negative. this shows the fact to be ture for the dilmass branch:

Breakpoint 1, TH1::GetRMS (this=0xbffd4010, axis=1) at hist/src/TH1.cxx:4755
4755 hist/src/TH1.cxx: No such file or directory.
in hist/src/TH1.cxx
(gdb) bt full
#0 TH1::GetRMS (this=0xbffd4010, axis=1) at hist/src/TH1.cxx:4755
rms2 = -5.5737980661518349e-42
stats = {-1.1744961477445355e-45, -5.6760270760144201e-42, 8.9270278642914836e-266, -5.5744083973184703e-42,
2.1957888350442956e-314, 2.1219957909652728e-308, -4.267776856812761e-42, 3.2335766999614399e-265, -1.3759019565301362e-50,
-1.1260524200449626e-50, -1.3748538304656908e-50}
x = 1.1646715302501235e-263
ax = {24, 0, 75145097}
axm = 14603089
#1 0xb5923934 in TNeuron::UseBranch (this=0x904cb98, input=0x8f62e98, formula=0x92e126c “dilmass”) at mlp/src/TNeuron.cxx:831
tmp = {…

So Ipresume this is some unitialised variable…? Is this a bug, or is it something wrong with my setup?

Cheers,
Toby

Hi Toby,

first of all: congratulations, it’s unbelievable how fast you learned to read debugger output. Now let’s solve your problem…

Looking at your values of ax I assume that you set the break point when entering GetRMS - where GetRMS’s variables are not yet initialized. I don’t see how TMath::Abs could return a negative value anyway, so my assumption is that your problem is somewhere else.

[quote]the division by fNorm[0], which would certainly messthings up if it were 0. using ‘bt full’ will only give the local variables to the functions in the stack - how do I get gdb to show me this information?[/quote]You can try “print fNorm[0]” or even “print *this”, but I suppose you have an optimized build of ROOT, so this info is most likely gone.

The branch==-1 is a bit suspicios. What is the expression for your 5th input neuron? And what does the tree look like? Can you TTree::Scan the expression of the 5th input for the first event?

Cheers, Axel.

Hi,
Is the value of rms2 before or after the function is performed, then - when does the debugger stop the program?
No, print fNorm did not work (for information, why would this be - where does the program store the value, and why would it be inaccessible in an optimised version?).
the first event looks like:

[tdavies@nglas16 FRZ21M]$ root -l results/NNtest.root
root [0]
Attaching file results/NNtest.root as _file0…
root [1] TTree * tr=(TTree*)gDirectory->Get(“TDDescVar”)
root [2] tr->Show(1)
======> EVENT:1
LepAEn = 77.0432
LepBEn = 53.7879
dilmass = 99.5447
metmag = 24.1303
ChargeProduct = -1
addEt = 113.62
dPhiMetLJ = 0.913697
ntightjets = 0
jet1_Et = 0
jet2_Et = 0
weight = 0.855943
rand = 0.168345
datasource = 0

so the fifth event is Charge Product with a value of -1, which seems ok to me; although it is negative, but that shouldn’t matter, as far as my understanding of Neural Nets goes. Why do you think it is the fifth entry which is faulty? In the backtrace on the previous page, it seems to have got the “branch = -1” message after the third call to TNeuron::GetValue. but then again, the value stored in dilmass seems ok too.

Cheers,
Toby

Hi Topy,

can you attach the code you use to build the ANN? Can you make the data file available somewhere, so I can check your code with your data?

I was talking about the 5th ANN input you define - that’s the one where you get the floating point exception (i==4). It’s not necessarily the 5th branch; it depends on how you define your ANN. And to me it seemed as if it already happens in the first event, i.e. entry number 0.

Locals and members can be unknown to the compiler in optimized mode because the name information is not kept, or because their value are undefined at a given position in your program due to instruction re-ordering or because a variable is simply removed (holds e.g. for constants).

Cheers, Axel.

Hi,
I had forgotten that the tree evenets have 0 as their first entry; the first entry is:

[tdavies@nglas08 results]$ root -l NNtest.root
root [0]
Attaching file NNtest.root as _file0…
root [1] TTree * tr=(TTree*)gDirectory->Get(“TDDescVar”)
root [2] tr->Show(0)
======> EVENT:0
LepAEn = 31.2799
LepBEn = 21.8725
dilmass = 27.3753
metmag = 33.4445
ChargeProduct = -1
addEt = 80.5515
dPhiMetLJ = 1.59649
ntightjets = 1
jet1_Et = 17.0165
jet2_Et = 0
weight = 0.887469
rand = 0.0392964
datasource = 0

 the root file is at:

www-cdf.fnal.gov/~tdavies/NNtest.root

the code I use to create the Neural Net is:

[tdavies@nglas08 results]$ root -l
root [0] gSystem->Load(“libMLP”);
root [1] gSystem->SetFPEMask(kDefaultMask)
(Int_t)0
root [2] TFile _file0 = TFile::Open(“NNtest.root”)
root [3] TTree * tr=(TTree
)gDirectory->Get(“TDDescVar”)
root [4] TMultiLayerPerceptron myNN(“dilmass,metmag,ChargeProduct,addEt,dPhiMetLJ,ntightjets,jet1_Et,jet2_Et:10:datasource”,“weight+((datasource==0)*20)”,tr,“Entry$%2”,"(Entry$+1)%2");
root [5] myNN.SetLearningMethod(TMultiLayerPerceptron::kStochastic)
root [6] myNN.Train(1,“text, graph, update=1”);
Training the Neural Network

*** Break *** floating point exception
Generating stack trace…
0xb59243d1 in TSynapse::GetValue() const at mlp/src/TSynapse.cxx:72 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
0xb5923ad5 in TNeuron::GetValue() const at mlp/src/TNeuron.cxx:864 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
0xb59243d1 in TSynapse::GetValue() const at mlp/src/TSynapse.cxx:72 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
0xb5923ad5 in TNeuron::GetValue() const at mlp/src/TNeuron.cxx:864 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
0xb5923cfd in TNeuron::GetError() const at mlp/src/TNeuron.cxx:939 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
0xb5923d83 in TNeuron::GetDeDw() const at mlp/src/TNeuron.cxx:950 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so
0xb5923de8 in TNeuron::GetDeDw() const at mlp/src/TNeuron.cxx:954 from /home/cdfsoft/products/root/v4_00_08gGCC_3_4_3/Linux+2.4/lib/libMLP.so

where some of the settings have been put in as suggested by delaere.

unfortunately, he said he got it to work fine :"( but I can assure you I’m not composing the back traces by hand :wink: Do you need the code I used to make the root file as well? It’s a bit involved…

Cheers,
Toby