Hello,
I just started linking my project against 5.34.07 binaries (previously 5.34.01) on win32 w/ VS2010. I notice that the behavior of TAxis has changed:
First, if I am viewing a 1D histogram and I zoom on the xaxis beyond the limits of this histogram, the under and/or overflow bins are displayed. It presents a misleading picture of the data especially if the under/overflow bins have many entries.
Furthermore, If I call TH1::GetXaxis()->SetRangeUser() with limits outside of the TH1 x-limits (or unzoom as discussed above) and then call TH1::GetMean(), or GetRMS() the computed values evidently treat the under/over flow bins as actual histogram bins located adjacent to the min and max bins. This causes all sorts of problems… I have not checked what happens w/ other statics or with 2D histos - have these changed as well?
Is there a way to disable this new feature? If not, is there a way that I can somehow clip the axis limits to the actual min/max values when the user unzooms via interacting with Canvas axes?
TAxis::SetRange has been fixed in one of the latest 5.34 patches. If you call axis->SetRange(0, NBINS+1) the underflow and overflow will be included. So you should not use those values outside 1,NBIN if you don’ t want to include underflow/overflow.
To just unzoom (i.e. reset the range), call axis->SetRange() or axis->SetRange(1,0) (i.e. with min > max) or call TAxis::Unzoom().
I think it is correct that the statistics will include underflow/overflow if you set a range outside the axis limits.
If something does not work correctly, please let me know, possibly showing it with some code reproducing the problem.
the problem is that under/overflow bins are treated as if they where adjacent to the lowest/highest valid bin and that they have nonsensical widths. With that values for e.g. the RMS or the mean cannot make sense anymore. They would be useful if the underflow bin would have a low edge at something like DBL_MIN and the overflow bin an upper edge at DBL_MAX, but that is not the case. The nonsensical width of overflow bins also forces users to treat them manually when e.g. taking out bin widths by dividing by their real width (which make sense for all other bins).
I know about this, a possible solution would be to store the average value of underflow and overflow, and having -inf and +inf as lower/upper edges, but I am not sure if this is really worth it. Having this would require quite some changes.
A similar problem is present when the statistics is reset and one has set the flag TH1::StatOverflows(true).
Also in this case mean and RMS will be computed using underflow/overflow assuming a wrong x position.
I think, I will change the code to never use underflow/overflow in the statistics if the StatOverflow flag is false (as it is in the default case). If the user sets the flag to true, he should expect that by including underflow/overflow in the range it could get a different result
I am not sure it is always obvious to the user what goes on here.
TH2D h("h","", 20,0,5, 20,0,5);
for (int i=0; i<1000; ++i)
h.Fill(gRandom->Gaus(), gRandom->Gaus());
std::cout << h.ProjectionY()->GetMean() << "\n";
// e.g. 0.01760513443864514
h2.GetXaxis()->SetRange(0, 21);
std::cout << h.ProjectionY()->GetMean() << "\n";
// e.g. 0.2925
Either the defaults are broken or this is still too confusing. These two
histograms look identical and have the same number of entries.
My feeling would be that under/overflow bins should have zero contribution to
a mean or rms and should be dropped in fits. After all they usually have a
maximal uncertainty on their x value which should kill them in anything
resembling a weigthed average. Requiring the user to remember that histograms are
implemented using an array where under/overflow bins are treated like normal
bins isn’t really optimal and she shouldn’t have to know about the detailed underlying
implementation to get correct results.
I am not sure what your example wants to prove. If I run your code I get identical histograms and get equal mean.
I agree with this, and currently are not used at all in fitting. Probably I should forget about the statoverflow flag when computing statistics from bin centre, and always exclude underflow/overflow from the statistics. Use that flag only when the statistics is computed at filling time. It makes more sense to me.
[quote=“moneta”]I am not sure what your example wants to prove. If I run your code I get identical histograms and get equal mean.
[/quote]
You are right, this works, I accidentally compared different histograms.
Is there any update on this? It looks like this behavior is still in 5.34.21. I.e., if a user drags the x axis zoom limits on a drawn TH1 and they happen to exceed the last bin, then the mean statistic suddenly jumps to the overflow-included value. This happens even with the StatOverflow flag having the default kFALSE value.
I’d like to prevent this.
Hi,
in 5.34.21 I still see this behaviour, which is not preferable! Is there a way to tell TH1 not to consider the overflow/underflow bins in the RMS calculation? In filling the histogram, I do not set any TH1::StatOverflows, so it would be the default value (kFALSE), but :
By default the histogram statistics (Mean and Standard Deviation) are computed at filling time, but including the entries only in the histogram range if TH1::StatOverflows() is false.
Now if you set an axis range on the histogram axis, as in your example, the statistics is computed using the bin center values (it is not possible to do differently since the information on the original entries is lost).
For this reason you will get a different values of Mean and StdDev.
See also the reference documentation of TH1::GetStats root.cern.ch/doc/master/classTH … 69f1e21992
Thanks Lorenzo, indeed yesterday I checked more deeply in the code and found that somewhere at the beginning we have TH1::StatOverflows(kTRUE);, so all our histograms have this flag set. So the difference I’ve shown is due to this.
Cheers
francesca