Question about confidence interval of linear regression

aardwolf · August 16, 2014, 6:07pm

I want to draw a confidence interval band of linear regression. And I find a tutorial file in /tutorials/fit/ConfidenceIntervals.C. The left most pad shows the kind of plot I want.

However, once I change the parameters of the tutorial file, the result of confidence interval I draw using this method is pretty far from what I expected. So I’m wondering that the confidence interval calculation routine GetConfidenceIntervals works correct or not.

I changed the line 29 of tutorial file from

to

The interval becomes incredible large compared to the scattering of data point.

Similarly, if I change the Gaus sigma to 0.01, the interval becomes incredible small.

In addition, I tried to use TGraphErrors instead of TGraph and I set the error of each data point by myself. The confidence interval then becomes strongly related to the size of error I set. This should not happen based on the simple linear regression theory. The confidence interval should only relate to the scattering of data point, but not relate to the error of each data point.

As a result, I suspect that there is something wrong with the GetConfidencInterval routine. Or maybe I doesn’t understand how it works correctly?

Thanks.

Eddy_Offermann · August 18, 2014, 11:23pm

You state

The confidence interval should only relate to the scattering of data point, but not relate to the error of each data point.

Should the error in each data point not reflect the scatter among the data points ?

aardwolf · August 19, 2014, 4:21pm

[quote=“Eddy Offermann”]
Should the error in each data point not reflect the scatter among the data points ?[/quote]

Theoretically, yes. So if the point error set by the user is correct, then the result looks good. But if user choose TGraph instead of TGraphErrors like the tutorial did, the result will looks awful.

So my solution was (1) fit the TGraph first and calculated the standard deviation by myself, (2) set the error at each point with the standard deviation, and then (3) fit the new TGraphErrors again and get the confidence interval.
The method looks dirty. So I want to ask if there is a better way to do it and also suggest that the tutorial should be modified correspondingly.

Eddy_Offermann · August 25, 2014, 7:55pm

Hopefully the data set x_i,y_i,dy_i , i = 0, … has errors that are some factor f off.
Just perform a fit with TGraphError, get the Chi^2 value and divide that by the degrees
of freedom. The expectation value of that number is (assuming Normal distributed y’s) 1.

Take the sqrt of that number and multiply your dy_i with it .

aardwolf · August 29, 2014, 3:17pm

[quote=“Eddy Offermann”]Hopefully the data set x_i,y_i,dy_i , i = 0, … has errors that are some factor f off.
Just perform a fit with TGraphError, get the Chi^2 value and divide that by the degrees
of freedom. The expectation value of that number is (assuming Normal distributed y’s) 1.

Take the sqrt of that number and multiply your dy_i with it .[/quote]

I agree with you. I suggest developer to modify the tutorial file and include these steps. I think the current tutorial is misleading.