I am fitting a histogram with a Landau function, but I am now wondering what is the dependence of the binning on the fit result and performances. Naively, I would hope that there is none, that I could build a histogram with a very large number of bins, such that each bin only contains 1 event so that I get rid of the binning dependence. But I’m not sure that is true and also, when doing that, the fit fails.
My question is then: is there a way to fit “unbinned data”? I see there is something on this topic in the manual but I’m not sure it means the same thing.
If that’s not possible, is there a recommendation, or a good habit to apply when binning a histogram for fitting?
I have made this small script to test the UnBinData method, but the fit fails for some reason. Could someone take a look at it? It’s trying to fit a Landau to a set of dE/dx (stopping power) values corresponding to ionization losses of a charge particle in silicon.
I don’t think fitting a TGraph would make any difference. It’s not binned, but each element in the y vector corresponds to the content of a bin in the associated histogram, so it’s effectively binned. The choice of the x vector elements defines the binning.
I do wonder if you better use RooFit itself, and pass in an unbinned Dataset.
As far as i can tell the RooLandau in roofit exists but it doesn’t support analytical integration.
In attachment a code written by @Da_Yu_Tou , which extend the Landau PDF to be analytically integrable, in case it can be of interest. Maybe some RooFit maintainer could check if the implementation I am providing here could be of any help in the next release.
The problem is that the default unbinned likelihood fit is not extended, and you are having a constant parameter in your Landau function. You can perform an extended fit by doing:
Also, I notice that now, my fits “works” (converge and make sense) almost each time (I’m fitting many histograms), while it was more often failing with the binned approach (described here Chapter: FittingHistograms before 7.7). I’m wondering: is there a reason NOT to use this UnBinnedData method? It seems to me that it would always be the best you can do given the fact that you’re not including an arbitrary factor that is the choice of the binning.
Yes the un in ed one is the best approach you can do for parameter estimation.
The drawback is the need to normalise the fitting function and this can be in some case more computational expensive and the need to evaluate the function on all data points.
Cheers