Inf/NaN in Histograms and TTrees

jfcaron · July 25, 2013, 5:58pm

I have frequently run into a problem when using TTrees as outputs of some analysis code, when a given branch has indeterminate status. Typically I put a “nonsense value” like -999 for integers, but I have to be careful that the value I choose is indeed “nonsense” for the quantity in question! It would be useful if there was a standard non-value to put into TTrees for this purpose.

I thought about using the existing inf/nan values for floating point numbers for this, but it seems ROOT behaves strangely when these are put into TTrees or TH*s. Observe:

* ROOT v5.34/09 * root [0] TH1D h("h","h",10,0,10) root [1] Double_t f = 0.0/0.0; root [2] f (Double_t)nan root [3] h.Fill(f) // This unexpectedly goes into the underflow bin (Int_t)(-1) root [4] h.Draw() root [5] f = 1.0/0.0 (const double)inf root [6] h.Fill(f) // This wrongly goes into the underflow bin (Int_t)(-1) root [7] f = -1.0/0.0 (const double)(-inf) root [8] h.Fill(f) // This goes into the underflow bin, as expected (Int_t)(-1)

In all these cases, the TH1 treats “nan” and both kinds of inf values as an underflow. It would make much more sense if positive infinity was counted as an OVERflow, and if nan was treated as neither. Treating nan as neither might cause some problems, for example the sum of all the entries of all bins wouldn’t equal the sum of the weights used when filling, but that’s the only way it could make sense.

I also looked at what happens when these kinds of values are put into a TTree:

root [9] TTree t("t","t")
root [10] t.Branch("f",&f,"f/D")
(class TBranch*)0x7fd74b65fb90
root [11] f
(Double_t)(-inf)
root [12] t.Fill()
(Int_t)8
root [13] f = 1.0/0.0
(const double)inf
root [14] t.Fill()
(Int_t)8
root [15] f = 0.0/0.0
(const double)nan
root [16] t.Fill()
(Int_t)8
root [17] t.Scan("f")
************************
*    Row   *         f *
************************
*        0 *      -inf *
*        1 *       inf *
*        2 *       nan *
************************
(Long64_t)3

So at least the TTree stores the values correctly, but then when I do t.Draw(“f”), the resulting histogram has 3 entries, 1 of which is an underlow, 2 of which are overflows! Trying each fill one at a time I found that the -inf was the underflow, and the +inf and nan values are overflows.

So there are several problems. 1) filling a histogram directly and from drawing a TTree makes +inf and nan go into different bins (underflow and overflow respectively), 2) nans are treated as an inf-like value, rather than a non-number.

The only way that makes sense for me is that filling a histogram with a +/-inf value should clearly go to the correct over/underflow bin (like the TTree drawing case), but that filling with a nan value should not increase the count in ANY of the bins. Whether filling with a nan should increase the total number of entries in the histogram.GetEntries() and stats box is not clear, probably it should not increase that number at all, or maybe keep track of the number of nan values internally?

A more ROOT-ish way to do it would be to define a special ROOT constant value which is understood by histograms and trees to be “missing data”. My suggestion above simply uses the existing floating-point NaN for this purpose, but such a NaN doesn’t exist for integers. A special ROOT constant could be implemented for all numerical types.

Jean-François

jfcaron · August 6, 2013, 11:02pm

Has anyone tried reproducing the inconsistent treatment of inf/NaN values that I described above?