Convoluting functions for a fit

bogdan · February 4, 2024, 12:55pm

Hi,

Complete newbie here.
I am following a paper here, that describes a way to fit multiple scattering distributions, using a Student’s t distribution, convoluted with a Gauss. The formulae of interest are 3.1 and 3.2. To this end, I have few, punctual problems:

At the moment, I manage to fit the convolution of a Gauss and Student’s t, but probably the fit can be improved, since the chi2ndof is ~8. Changing the initial guesses does not help much (I even tried putting the values from the minimization as initial parameters)… Anything else I can do to improve the fit? Is my understanding wrong (i.e. that one takes the results of the minimization and uses them for an improved initial guess)?
(Also, at the same time, if I include the error in the x, the fit does not work anymore. For now, I replace those with zeroes.)
Moving further to reproducing formula 3.2, which convolutes yet again the previous convolution of Gauss and Student’s t with another Student’s t, I am kind of stuck (basically no idea what I’m doing…)
I tried following some tutorials and answers I found on the forum to write the integral, but I did not manage (you can see my attempt is commented, both the definition of the function and the fit).
I also don’t understand if the parameters should be different from the first Student’s t distribution or not…
Moreover, with these initial parameters, I get: Warning in <ROOT::Math::ROOT::Math::GausIntegrator>: Failed to reach the desired tolerance ; maxtol = 1e-09
What can I do in this case?

Thank you very much for your help!
PS: I am aware of some code from one of the authors, asked on the forum ( … fit-convolution-of-two-functions-doesnt-converge/16507), however it does not help me much… and probably a few things changed since 2013

My data can be found here:
scatt_data.zip (2.8 KB)

The code that I have so far can be found below.

#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <TGraphErrors.h>
#include <TCanvas.h>
#include <TAxis.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <Math/Functor.h>
#include <iostream>

double fitFunction(double *x, double *par) {
    double theta = x[0];
    double N = par[0];
    double a = par[1];
    double mu = par[2];
    double sigma_g = par[3];
    double sigma = par[4];
    double tail_parameter = par[5];

    double gaussian_part = (1 - a) * TMath::Gaus(theta, mu, sigma_g, kTRUE);

    double z = (theta - mu) / sigma;
    double t_part = a * TMath::Student(z, tail_parameter) / sigma;

    return N * (gaussian_part + t_part);
}

// double convolutionIntegral(double *x, double *par) {
//     double theta = x[0];

//     ROOT::Math::Functor1D convolutionFunc([&](double tau) {
//         double tempX[1] = {theta - tau};
//         return fitFunction(tempX, par);
//     });

//     ROOT::Math::Integrator integrator(convolutionFunc);
//     double lowerLimit = -10; // no clue?!
//     double upperLimit = 10;
//     double integral = integrator.Integral(lowerLimit, upperLimit);

//     return integral;
// }





void readcsv() {
    std::string filePath = "scatt_data.csv";
    std::ifstream inFile(filePath);

    std::string line;
    std::getline(inFile, line);    // skip the header

    double bin_center, count, errorx, errory;
    std::vector<double> bin_centers, counts, errorsx, errorsy;

    while (std::getline(inFile, line)) {
        std::istringstream iss(line);
        std::string value;
        
        std::getline(iss, value, ',');
        bin_center = std::stod(value);
        std::getline(iss, value, ',');
        count = std::stod(value);
        std::getline(iss, value, ',');
        errorx = std::stod(value);
        std::getline(iss, value, ',');
        errory = std::stod(value);

        bin_centers.push_back(bin_center);
        counts.push_back(count);
        errorsx.push_back(errorx);
        errorsy.push_back(errory);
    }

    int nPoints = bin_centers.size();
    double *x = bin_centers.data();
    double *y = counts.data();
    double *ex = new double[nPoints];
    std::fill_n(ex, nPoints, 0);
    double *ey = errorsy.data();
    // double *ex = errorsx.data();  // fit fails if I use this ... todo

    TGraphErrors *graph = new TGraphErrors(nPoints, x, y, ex, ey);
    graph->SetMarkerStyle(21);
    graph->SetMarkerColor(kBlue);
    graph->SetLineColor(kBlue);




    TF1 *fitFunc = new TF1("fitFunc", fitFunction, -50, 50, 6);
    fitFunc->SetParameters(300000, 0.05, 0, 0.6, 3, 3); // Initial guesses for N, a, mu, sigma_g, sigma, tail
    graph->Fit(fitFunc, "F", "", -50, 50);
    fitFunc = graph->GetFunction("fitFunc");
    fitFunc->GetNDF();
    fitFunc->GetChisquare();
    fitFunc->GetProb();

    std::cout << "Chi-squared: " << fitFunc->GetChisquare() << std::endl;
    std::cout << "NDF: " << fitFunc->GetNDF() << std::endl;
    std::cout << "Probability: " << fitFunc->GetProb() << std::endl;
    std::cout << "Parameters:" << std::endl;
    for (int i = 0; i < fitFunc->GetNpar(); ++i) {
    std::cout << "Param " << i << ": " << fitFunc->GetParameter(i) << " +/- " << fitFunc->GetParError(i) << std::endl;}





    // TF1 *convolutionFitFunc = new TF1("convolutionFitFunc", convolutionIntegral, -50, 50, 6);
    // convolutionFitFunc->SetParameters(10000, 0.5, 0, 3, 3, 2); // N, a, mu, sigma_g, sigma, tail

    // graph->Fit(convolutionFitFunc, "F", "", -50, 50);

    // TF1 *fitResult = graph->GetFunction("convolutionFitFunc");
    // std::cout << "Chi-squared conv: " << fitResult->GetChisquare() << std::endl;
    // std::cout << "NDF conv: " << fitResult->GetNDF() << std::endl;
    // std::cout << "Probability conv: " << fitResult->GetProb() << std::endl;
    // std::cout << "Parameters conv:" << std::endl;
    // for (int i = 0; i < fitResult->GetNpar(); ++i) {
    //     std::cout << "Param  conv " << i << ": " << fitResult->GetParameter(i) << " +/- " << fitResult->GetParError(i) << std::endl;
    // }






    TFile *file = new TFile("scattering_data.root", "RECREATE"); // need later
    graph->Write("scattering");
    file->Close();
    delete file;    

    TCanvas *c1 = new TCanvas("c1", "Data with Error Bars", 800, 600);
    gPad->SetLogy(); 
    graph->SetTitle("Data with Error Bars;Bin Center;Counts");
    graph->Draw("AP");
    graph->GetXaxis()->SetLimits(-50, 50);
    graph->SetMinimum(1);
    graph->SetMaximum(1000000); 

    c1->Update();
}

int read_py() {
    readcsv();
    return 0;
}

ROOT Version: 6.24/06
Platform: linuxx8664 (WSL2 on Windows)
Compiler: gcc

mdessole · February 5, 2024, 2:33pm

Hi @bogdan,
welcome on the ROOT forum!
I think that @moneta could help here.

Cheers,
Monica

Dilicus · February 6, 2024, 7:23am

Hi @bogdan,

I worked a bit on your dataset, I attach my code below. From the article they are fitting histograms so maybe fitting using x-errors do not have a lot of sense, and maybe you should use the L option for the fit. Morover the first set of data which is fitted with the sum of gaussian and t-student is used to fix the parameters of the second fit. (Have you the data used for the first data set?)
In the tutorial class of the root page you can look at the fit convolution example.
Even without the first set of data to fix the parameters for convolution fit the results are not to bad.

Best,
Stefano

FitFunc.c (1.9 KB)

bogdan · February 8, 2024, 8:11pm

Hello!

This is fantastic! Thanks!!
Sorry for the late reply, I tried to get the set of data for the first and second fit, which I attach here also: with_and_without_target.zip (5.8 KB)

Now, I tried, following your example to do this myself. In a very first step, I tried directly loading the data with a target, but the fit was not working as expected. Then, I tried fitting the data without target using the sum gauss + t-student and feeding the parameters to the second fit, where I take data with target. The way I wrote the code right now is not yielding the correct results:

Warning in <Fit>: Abnormal termination of minimization.
****************************************
         Invalid FitResult  (status = 3 )

I add here also the code I am running.
SecondFitFunc.c (2.4 KB)

Is there anything I am doing wrong?

Many, many thanks!
Have a nice evening!
Bogdan

Dilicus · February 9, 2024, 1:03pm

I uploaded my new version of the code.
I fitted the distribution without target with the gauss + t-student distributions.

Then I fed the parameters of the first fit to second function (gauss + t-student distribution convoluted with the t-student distribution), fixing the parameters obtained from the first fit.
There is only one thing your data are in the range [-50 ,50], while the article data have a different x-range

FitFunc.c (2.1 KB)

Best,
Stefano

bogdan · February 10, 2024, 11:11pm

Thank you for your answer!

To add some clarification: The paper investigated the influence of 50um of Si as their scatterer. In my case, I have a sample that is about 10 times that.
The range is the similar: rad in the paper and mrad in my case. Moreover, my data is from 1GeV/c electrons, so one can compare with the top figure from both Fig. 2 and 3 of the paper.

Now, it seems I am underestimating with my fit the region 5 < |x| < 15 for the no target case, which then propagates to the target fit. I am wondering why this is still the case… but I have no good explanation. Or what I could do to correct it for that matter.

I wanted just to dig a bit deeper and observed that the authors of the paper used a slightly modified tdistribution_pdf with respect to the one from ROOT, in which a sigma term appears in the first fraction’s denominator.

To test this, I assumed I need to write a modified distribution myself, since I see no way to add it otherwise.

double modifiedStudentTDistribution(double *x, double *par) {
    double t = (x[0] - par[2]); 
    double nu = par[5]; 
    double sigma_stud = par[4];

    double numerator = TMath::Gamma((nu + 1.0) / 2.0);
    double denominator = TMath::Gamma(nu / 2.0) * TMath::Sqrt(nu * TMath::Pi()) * sigma_stud;
    double power = - (nu + 1.0) / 2.0;
    double factor = (1.0 + (t * t) / (nu * sigma_stud * sigma_stud));

    return (numerator / denominator) * TMath::Power(factor, power);
}

I tried adding it to the fitFunc like:

TF1 *fitFunc = new TF1("fitFunc", "[0]*( (1-[1])*ROOT::Math::normal_pdf(x - [2], [3])  +  [1]*modifiedStudentTDistribution(x, [2], [4], [5]) )", -50, 50);

But it does not get recognized:

Error in <TFormula::Eval>: Formula is invalid and not ready to execute
ROOT::Math::normal_pdf is unknown.
modifiedStudentTDistribution is unknown.

Again, no idea why this happens or if it will even help. From the fit examples, the way I implemented should in theory work

Dilicus · February 13, 2024, 8:19am

Hi bogdan,
sorry for the late reply. I was quite busy on work today.

I think to account for the slightly different t-student distribution you can divide the full function for the same parameters used in the first argument

in the function

TF1 *fitFunc = new TF1("fitFunc",  "[0]*( (1-[1])*ROOT::Math::normal_pdf(x - [2], [3])  +  [1]/[4]*ROOT::Math::tdistribution_pdf( (x-[2])/[4], [5])  ) ", -50, 50);

and the convolution

TF1Convolution *f_conv = new TF1Convolution("fitFunc", "1/[1]*ROOT::Math::tdistribution_pdf(x/[1],[0])", -50,50, **true**);

Regarding your function, I think the main problem is that you used par[2],par[4] and par[5] inside your definition, instead you should use par[0], par[1] and par[2] and call the function as you did it.
Otherwise is like the function needs more argument but you gave only 3 instead of 6

bogdan · February 22, 2024, 6:11pm

Hi, Stefano!
Been gone for a few days, sorry.

I am a bit confused by what you said here:

I think to account for the slightly different t-student distribution you can divide the full function for the same parameters used in the first argument

If I divide the whole distribution by par[4] I solve one part of the problem, adding to the denominator the missing sigma. However, in the paper, there is another in the denominator in the parentheses. You can see this in my custom formula as sigma_stud. This differs from the implementation of tdistribution_pdf (ROOT: Probability Density Functions (PDF))

Nonetheless, I tried the two functions you linked below, unfortunately they did not change a thing

Regarding my custom function, why should it need more than 3? I only need to pass it the mean, the sigma and that nu parameter, which is what I do. I changed to [0], [1], [2] inside the function (as well as switching the pars between nu and sigma_stud, given the order in which I pass them).
However, the function is still not being recognized and I get the same error as I had in my previous post.

Thank you!

Dilicus · February 27, 2024, 6:17pm

Hi bogdan,
to add the \sigma both in front of the distribution and inside the parathesis I just use the parameter that I need in front of the root t-distribution and also to divide the x which is the first argument of the function.

"1/[1]*ROOT::Math::tdistribution_pdf( x/[1], [2])  "

If you look at the documentation here the first argument of the function is squared.

Concerning your other problem I was you to use 6 parameters because in your custom function you add these lines

double t = (x[0] - par[2]); 
double nu = par[5]; 
double sigma_stud = par[4];

so when calling the function you should have used the following syntax

"modifiedStudentTDistribution(x, 0,0,[2],0, [4], [5])"

but since you changed the number inside the definition of your function, your syntax is now correct

I hope now is clear what I meant. I am not sure to have explained myself

system · March 12, 2024, 6:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.