Display of x-positions and error bars of unbinned data

canigia · July 26, 2021, 9:08am

Hi,

this is perhaps a more academic question in a way, but why is it that in the binned representation of unbinned data (RooDataSet) the x-position of the markers are “bin-centered” (=equidistant) and the error bars for the x-coordinate span the full “bin width”, i.e. half of the distance between two x-position?

What I would expect is that the positions should match the mean value within the bin (which can be computed for unbinned data in contrast to originally binned data), and the error bars should reflect the standard deviation within the bin. E.g. for uniform distribution this should be bin-width/sqrt(12), and the extrem case of having all entries on both bin edges would give such a large error bar.

I.e. fitting the current representation by the chi^2 method should give incorrect results, or am I wrong?

Best, Klaus

jonas · July 26, 2021, 10:04am

Hi @canigia,

thanks for asking about this on the ROOT forum!

For a chi2 fit of a histogram, X-axis uncertainties play no role. Yes, there is an uncertainty on the counts in each bin that represents a variance for the Y-values, but there is no uncertainty on the X-values, because they are just bin boundaries.

What you describe is only interesting for visualization. Right now, the RooFit plots “abuse” X-axis uncertainties to indicate the bin width, and the a marker is placed in the bin center. What you describe is a nice way to visualize histograms, presenting some information on the distribution within a bin, but I don’t see that this would be the expected behaviour If you have histograms where the distribution within each bin is very skewed, maybe you should increase the number of bins?

I hope this answers your question! If now, feel free to make further suggestions and comments.

Cheers,
Jonas

canigia · July 26, 2021, 10:15am

Hi Jonas,

thanks for the quick reply!

Intuitively I always thought that at least for TGraphErrors uncertainties in x would be taken into account somehow. I could imagine a methode where you could used the point of closest approach of the fit function to the (x,y) position of the marker and use dx^2+dy^2 as chi^2.

Is this the case, or are uncertainties in x always just decoration?

Best,
Klaus

canigia · July 26, 2021, 10:16am

I just realize, that there might be an issue, if x and y live on very different scales though…

jonas · July 26, 2021, 10:47am

Hi Klaus,

absolutely, for the TGraphErrors the x-uncertainty goes into the chi^2! But not directly, otherwise you would indeed get a problem if the scales are different. Instead, the function is projected along the y-direction by calculating the function at the points x-exlow and x+exhigh (see the documentation of TGraph::Fit()).

Still, in RooFit we are fitting histograms, not graphs. For histograms, there is no uncertainty on the x-axis, and if there is one in the plot it’s for decoration.

I hope this clarifies things!

Cheers,
Jonas

canigia · July 26, 2021, 11:31am

Yes, thanks a lot!

system · August 9, 2021, 11:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.