Hi,
I’m using ROOT::Fit::Fitter
class to do a linear regression. Here is a minimum example (using PYROOT):
import ROOT
import numpy as np
fitter = ROOT.Fit.Fitter()
func = ROOT.TF1("f1", "pol1", 0, 130)
fitter.SetFunction(ROOT.Math.WrappedMultiTF1(func, func.GetNdim()), True)
fitter.Config().SetMinimizer("Linear")
x = np.array([67.5, 67.5, 47.5, 47.5, 57.5])
x_err = np.array([2.5, 2.5, 2.5, 2.5, 2.5])
y = np.array([107.5, 102.5, -127.5, 117.5, 112.5])
y_err = np.array([2.5, 2.5, 2.5, 2.5, 2.5])
bin_data = ROOT.Fit.BinData(len(x), x, y, x_err, y_err)
fitter.Config().ParSettings(0).SetValue(0.)
fitter.Config().ParSettings(1).SetValue(1.)
is_ok = fitter.Fit(bin_data)
print(f"is ok: {is_ok}")
if is_ok:
res = fitter.Result()
print(f"slope: {res.Parameter(1)}")
print(f"offset: {res.Parameter(0)}")
print(f"pvalue: {res.Prob()}")
The printout is:
is ok: True
slope: 5.500000000000007
offset: -253.7500000000004
pvalue: 1.4567546900600512e-36
However, the dataset usually contains some outliers (including the dataset in the example above), which makes the final fitting result quite bad. If I use scikit-learn library, this could be resolved by using Huber regression, which “truncates” the implications from the outliers. Here is an example:
import numpy as np
from sklearn.linear_model import HuberRegressor, LinearRegression
x = np.array([67.5, 67.5, 47.5, 47.5, 57.5])
x_err = np.array([2.5, 2.5, 2.5, 2.5, 2.5])
y = np.array([107.5, 102.5, -127.5, 117.5, 112.5])
y_err = np.array([2.5, 2.5, 2.5, 2.5, 2.5])
huber_res = HuberRegressor().fit(np.reshape(x, (-1, 1)), y)
print(f"coef: {huber_res.coef_}, offset: {huber_res.intercept_}")
The printout is:
coef: [-0.46186719], offset: 136.75230379098608
Here is a comparison between the two methods (red line is using huber loss function and blue line is using ROOT’s normal linear fitting).
The huber loss function yields a much better result.
Thus, I would like to know whether I could use ROOT to do the huber regression as we need to do this with C++ in our project.
Thanks for your attention