TH1::GetMean() after SetRange

hreisin · July 14, 2011, 11:36pm

Hi,

After doing SetRange (and SetRangeUser) histogram’s stats are reprocessed constrained to the new axis limits. In this case, the GetMean() and GetMeanError() give different values than arithmetic results calculated from the original data. This seems to be a consequence of the calculation of stats[2] and stats[3], which hold Sum (wx) and Sum (wx*x).
For example a histogram with 5 bins from 0 to 5 filled with:

Fill(0.1, w1); Fill(0.2, w2);

has after SetRange(1,2):

stat[2] = (w1+w2)*0.5; // == BinContent * BinCenter
stat[3] = (w1+w2)0.50.5; // == BinContent * BinCenter * BinCenter

In order to get the arithmetic mean those elements should have instead:

stat[2] = w10.1+w20.2;
stat[3] = w10.10.1+w20.20.2;

There are several ways to evade this problem while still using the histogram methods, like filling additional histograms with the needed values. Yet it would be great if TH1 class could have arrays fSumwx and fSumwx2 holding this information. Could these arrays be included in future versions?

Below is a minimal example which shows the awkward situation where changing the range results in different mean and mean error, although the histogram has the same contents:

{
    TH1D h ("h", "",10,0,10);
    h.Sumw2();

    h.Fill(0.1, 2.);
    h.Fill(0.2, 4.);

    cout << "Original" << endl;
    cout << h.GetMean() << '\t' << h.GetMeanError() << endl;
    double stats1[4];
    h.GetStats(stats1);
    cout << stats1[0] << '\t'
         << stats1[1] << '\t' 
         << stats1[2] << '\t' 
         << stats1[3] << endl;

    cout << "After SetRange" << endl;
    h.GetXaxis()->SetRange(1,5);

    cout << h.GetMean() << '\t' << h.GetMeanError() << endl;
    double stats2[4];
    h.GetStats(stats2);
    cout << stats2[0] << '\t'
         << stats2[1] << '\t' 
         << stats2[2] << '\t' 
         << stats2[3] << endl;
}

Thanks,

Hernan.

moneta · August 11, 2011, 8:19am

Hi,

These extra arrays could be in principle be added to the histogram but they will cost extra memory since one needs to store this extra information for each bin, and this could be a problem in case of applications with many histograms and in particular multi-dimensional ones.
The class TProfile contains this extra information but in the y variable. In principle you could use a TProfile filled with y=x and look not at GetBinContent but GetBinEntries.
Otherwise in case of low statistics you could use the Histogram buffer, where all the original entries are kept.

Best Regards

Lorenzo