Request: Make Draw commands ignore +/-inf and NaN for axes

jfcaron · September 13, 2013, 9:55pm

I often encounter data that contains a mix of valid values and +/-inf or nan values. For example I have a ROOT::Math::Interpolator with data points taken from a file covering a range, and if I construct a TF1 from this object, I need to carefully set the range, because the Interpolator returns NaN outside of the data range.

If the range is not set properly I get the common “Inf/NaN propagated to pad” warning and a blank canvas.

Unfortunately for me, the Interpolator provides no way to access the data that was used to construct it. I do not wish to store that range since the Interpolator is created and returned by a function. Anyways I see this as a symptom of a broader problem: the Draw commands consider Inf/NaN to be a problem when creating automatic axis ranges, even if there is otherwise valid data in the dataset.

My suggestion:

keep the current behavior if ALL of the data are Inf/NaN.
if any of the data are valid, use those for determining the automatic axes and ignore any Inf/NaN values. Print a warning message about the Inf/NaN values.

This is similar to how gnuplot handles infinities, and it is very handy when you really want to visualize data and don’t wish to clean it up “manually”. Quite clearly Inf/NaN values should not be plotted, so this is quite intuitive, with the error message making it clear that it isn’t the whole picture.

An alternative:

provide alternate GetMaximum/GetMinimum/GetMaximumX… methods that return the relevant value other than Inf/NaN. Something like “GetMaximumFinite()” or “GetMaximumXFinite()” or something.

Note that because I am using a ROOT::Math::Interpolator object, I cannot just wrap it in a TF1 that checks if TMath::IsNaN(), as that would require creating and compiling an entirely new functor object. Even if I did that, it would require me to choose a non-NaN and non-Inf value to return, which kind of the whole point of NaN and Inf in the first place.

I explored Inf/NaN and had other suggestions a few months ago here, but it didn’t draw much attention:

Jean-François

pamputt · September 17, 2013, 8:28am

I support the proposal. I think also it would be a great improvement

pamputt · January 17, 2014, 1:09pm

A bug report has been opened to support and keep a track of this wish.

couet · January 17, 2014, 1:20pm

I am not sure that should be treated at the graphics level or before.

It seems your data are in a TTree ? right ?

For instance is the following case one off them you have in mind:

Double X Y
TGraph(N,X,Y).

X and Y contain Nan and Inf…
but also some valid numbers …
do you want that TGraph produces a plot with the valid numbers ?

pamputt · January 17, 2014, 1:31pm

I do not know for Jean-François but to me, I wish also that it works with TGraph, TH1 or whatever

couet · January 17, 2014, 1:36pm

We should define what you mean by that.
Do you want to silently ignore all the invalid data and make a new valid data set with the remaining one ?
Should the user be warn that his data is buggy ?

pamputt · January 17, 2014, 1:44pm

Indeed, there are two choices,
Either, we add an option to the Draw method to clearly ask to draw without taking care of the Inf/Nan value
Or do it silently and display a warn message to alert the user that his data are buggy.
I prefer the first solution because it needs an action of the user (he knows what he is doing). If you choose the second one, I think a warn message must be displayed at least to inform the user. IMHO, it should not be totally silent for the user.

couet · January 17, 2014, 1:57pm

The problem with such approach is that we will penalize 99% of the users having good data set.
Graphics comes at the end of the process of data visualization and trying to trap such bad data
will be a killer for many cases. Just take the case of an histogram of several 1000 of bins, we will need to test
all the X and Y values to see if there is an NaN or an Inf ? the performance penalty will be huge for something
which will be most of the time doing nothing because there will be not buggy values in X and Y.

Seems to me it is up to the application porgram, to make sure the data are ok before sending them to graphics.

pamputt · January 17, 2014, 2:12pm

Right, it must not penalise the “correct” data. But it should be the case if the first solution is used (specify explicitly an option in the Draw method)
For example, if I type

f->Draw()

the current behaviour is used (“Inf/NaN propagated to pad” warning and a blank canvas)
but if I type

f->Draw("wo_Inf_NaN")

then ROOT checks before plotting whether there are Inf.NaN values and remove them if so. It would be slower but it would be a choice of the user.
Some times, it would be faster to do something like

root TGraph *g = new TGraph("foo.dat"); g->Draw("wo_Inf_NaN");
where foo.dat contains Inf/NaN values, than write a piece of code to generate a correct set of data. The same is true with a TF1 in the example given by Jean-François.

Do you mean that it could be check before drawing by ROOT (for example when the TF1/TGraph/TH1 is defined/filled) or at another level?

couet · January 17, 2014, 2:44pm

Seems to me the user is already warned clearly:

root [0] TH1D h("h","h",100,0,100)
root [1] h.Fill(50,TMath::Infinity());
root [2] h.Draw()
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
Warning in <TCanvas::ResizePad>: Inf/NaN propagated to the pad. Check drawn objects.
Warning in <TCanvas::ResizePad>: c1 height changed from 0 to 10

root [3] Warning in <TCanvas::ResizePad>: Inf/NaN propagated to the pad. Check drawn objects.
Warning in <TCanvas::ResizePad>: c1 height changed from 0 to 10

Warning in <TCanvas::ResizePad>: Inf/NaN propagated to the pad. Check drawn objects.
Warning in <TCanvas::ResizePad>: c1 height changed from 0 to 10

pamputt · January 17, 2014, 3:03pm

Indeed, the user is but I wish that the Draw method can ignore the Inf/NaN values to display the correct values instead of a blank canvas.
One idea to display the “good” data without taking care of the Inf/NaN values would be to give a “key word” in the option parameter of the Draw method.

couet · January 17, 2014, 3:09pm

Seems to me that issue was already discussed in the past and it was decided that if an histogram or a graph has bad data it is up to the user to check it before building the histogram or the graph.
That’s the sense of the message we display:

Check drawn objects

ROOT provides all the tools to test Inf and NaN in the math library.

pamputt · January 17, 2014, 3:37pm

You are right that the user should check all the data he uses (it is safer). However, some times it could be convenient to remove “automatically” the “wrong” data to be able to plot the data and so have a preview of the data quickly.
As I said before, this is for example the case where one wants to look at the content of a text file which contains Inf/NaN values doing

TGraph *g = new TGraph("foo.dat"); g->Draw("wo_Inf_NaN");
And also with a TF1 which uses a ROOT::Math::Interpolator object as in the example by Jean-François.

Anyway, if you think it is not a good idea, I understand

couet · January 17, 2014, 3:53pm

Seems to me the right way to proceed is:

TGraph *g = new TGraph("foo.dat");
g->Draw();

you get the message mentioned before
you modify the file foo.dat (remove the faulty lines).

At least that’s how I proceed. When I have data I feel responsible of their correctness.

jfcaron · January 17, 2014, 6:42pm

I’m not proposing that the TGraphPainter class actually do any data filtering. I am proposing that in the calculation of the automatic axis ranges (which DOES have to check every value anyways), that infs and NaNs be ignored. An error message saying something like “Some data points lie outside the automatic axis range.” might be useful.

The +/-inf and NaN values in the floating-point standard are there for a reason and they are sometimes reasonable values. I think it is worse for ROOT developers to decide for users that they should never get NaNs or infs than for automatic axis ranges to sometimes be drawn without all the points visible. Inf and NaN in data doesn’t mean it is “faulty” or “bad data”. For example they are often useful “no value” values, better than -99999 or whatever I often see in ROOTish code.

The current behavior with Twhatever::Draw is a waste when there are inf/NaN values. It doesn’t draw anything useful and then prints an error message. Why couldn’t it instead draw something useful and still print and error message?

As an example of perhaps going a bit too far: gnuplot silently ignores NaNs and infs whenever drawing, so a trick there to filter data is to use an operation that divides by zero sometimes. gnuplot does this entirely silently, and you have to be very aware of it when using gnuplot. My proposal would be for some intermediate thing, where ROOT would still draw a useful thing, but inform the user with a warning.

If the solution is to add a new draw option that ignores inf and NaN values as pamputt proposes, I’d be satisfied.

Jean-François

tc3t · January 17, 2014, 9:41pm

Just to add another point of view here, when evaluating the sufficiency of the warning messages printed to console, it’s worth noting that ROOT can also be used as a library and in such context (e.g. GUI application) there might be no console to see error messages from. In these cases figuring out the reason for (silently) blank canvas can take a bit more effort. Having the error message or some other form of indication right there on the canvas could be a nice feature (it’s another question how feasible it is to implement).

couet · January 20, 2014, 8:31am

Hello,

Right now the Inf and Nan are trapped. An error message is displayed. But I agree with you that nothing is drawn.

You suggest:

So you want to draw only the empty Frame ? that will not be much more than drawing nothing.
And if you want ROOT draw really something then the X and Y filtering is required.

jfcaron · January 20, 2014, 5:22pm

Why would the frame be empty? I’m saying that it would be useful if drawing a TGraph or other thing with some data that includes infs and NaNs still drew something useful. True, if ALL of the data are inf/NaN, then the frame would be empty.

I guess I should get specific to illustrate:

TCanvas c1;
Double_t x[] = {0,1,2,3,4,5,6,7,8,9};
Double_t y[10];
for(Int_t i = 0;i<10;i++){y[i] = 5.0/x[i];}
TGraph g(10,x,y);
g.Draw("ALP");

I want this code to draw the TGraph with axis ranges that allow me to see all of the non-inf and non-NaN values. The code should also generate some kind of warning message.

Somewhere in the automatic axis size calculation code there MUST be a loop over all the values that finds the maximum and minimum x and y values, right? I’m suggesting that in that loop, a simple if statement be added that skips the inf/NaN values and allows for a finite axis size to be calculated.

Jean-François

couet · January 21, 2014, 8:55am

Ok, so we need a data filtering inside the TGraphPainter class …
fine…

Not that the message is already displayed:

root [5] g.Draw("ALP");
Warning in <TCanvas::ResizePad>: Inf/NaN propagated to the pad. Check drawn objects.
Warning in <TCanvas::ResizePad>: c1 height changed from 0 to 10