TGraphErrors Fit, how to determine if a fit is good?

smeriano · February 14, 2023, 11:40am

_
grapherror.root (141.2 KB)
grapherrorfit.C (457 Bytes)

Greetings everyone,

I opened this topic in Roofit and RooStats section but I think this is a newbie question. So I’m reopening the topic in the Newbie section.
I attach a .root file that entails a TGraphErrors graph and a simple script that fits the data.
People suggested to me that the fit is underperforming because the fit values for Radius ~ 100cm should be higher and that The slope should be sharper there.
I’m trying to understand if this is really the case or it is just the data that lead to such fit results.
How should i determine if I set the correct parameters or if the fit is the best that it can be?

In general, what ROOT tools would you suggest for such studies?

__
Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

couet · February 14, 2023, 12:36pm

I think @moneta can help

smeriano · February 15, 2023, 9:40pm

Thank you ! Any thought is helpful!

moneta · February 16, 2023, 3:43pm

Hi,
Your data look vary noise with lots of outliers. I think the obtained fit looks the best you can have. The chi-square is large, 99000 for 7740 n.d.f., so the fit probability is zero.
This is due to the many outliers you are having. If you need to estimates some parameters from your data, I would try to clean the data first.
Cheers

Lorenzo

smeriano · February 16, 2023, 4:43pm

Hi Lorenzo, yes what you say makes sense.
I have a last question. Let’s say i want to fit the hit rate data, as I do in the script. And then calculate what the hit rate would be at r=100cm.

I would then simply use :

p[0]+p[1]*exp(-p[2]*100)

But in order to calculate the error propagation at that point , I use the covariance matrix :

    auto fitResult = g->Fit(expfunc,"S");
   auto covMatrix = fitResult->GetCovarianceMatrix();

//partial derivatives of the function parameters.
double df_dp0 = 1;
double df_dp1 = exp(-p[2] * 100);
double df_dp2 = -p[1]*p[2]*exp(-p[2] * 100);

//This is the standard propagation formula for correlated variables. sigma_exp is basically the standard 
//deviation of the exponential function

double sigma_exp = sqrt(    pow(df_dp0,2)*covMatrix(0,0) + pow(df_dp1,2)*covMatrix(1,1)  + pow(df_dp2,2)*covMatrix(2,2)    

                           + 2 * df_dp0 * df_dp1 *covMatrix(0,1) + 2 * df_dp0 * df_dp2 *covMatrix(0,2) + 2 * df_dp1 * df_dp2 *covMatrix(0,2)

                       );

Would this be a correct error propagation calculation? I’m not sure as I do not consider the errors of the data points in the first place, but I guess the fit procedure does that already.

I’m reattaching the simple script with the above lines of code.

grapherrorfit.C (1.4 KB)

moneta · February 16, 2023, 5:37pm

The procedure using error propagation is correct in the approximation the errors are small, otherwise the approximation is not valid anymore. In this case it would be better to re-parametrize your fit function and fit directly the parameter you are interested (e.g. the hit rate)
Then in your case the error resulting from the fit (and the covariance matrix) do not really reflect what is the actual parameter uncertainty, given the large chi2 obtained. It is possible your data points error are underestimated and then the resulting covariance matrix is also not reflecting your correct uncertainties.

Lorenzo

smeriano · February 16, 2023, 5:54pm

First of all everything you’re saying is really helpful. So thank you for that!!

About the re-parametrization, do you mean that I should create a new TF1,
with the now known parameters p[0],p[1],p[2] that were calculated and fit that new TF1?

I didn’t get that part.

moneta · February 16, 2023, 6:20pm

I am meaning to re-define the parameters in a way that is optimal to get the error. In your case actually I don’t think you can do much.

Lorenzo

smeriano · February 16, 2023, 10:52pm

Ok overall i can improve the fit by cleaning some data, as you said. There is also the possibility of grouping somehow the data , so instead of 8000+ data points , i could have way less. That could help the fitting procedure, as the chi2 per ndof can decrease significantly. Anyways, just posting some thoughts.

Your Answers were very helpful, thank you!

Spyros

moneta · February 17, 2023, 10:21am

Hi,

Yes, it is maybe not a bad idea to group your points if they are for similar x. You can use the TProfile class for this, it will compute automatically the error from the spread of your data points and then you can fit it directly

Best,

Lorenzo