I have a question concerning TFractionFitter. I would like to fit a data set with 2 MC histograms and get the reconstructed fractions. I use the simple example showed in TFractionFitter class reference.
My fit converges and I get reasonable estimates of the true fractions, but it seems to me that the errors over the reconstructed fractions is really high, so probably I’m doing something wrong. How is calculated this error? Why is so large?
I post the output of my macro (at the end there is a comparison between true and rec fractions).
output.txt (1.98 KB)
the errors looks large, but in order to solve your problem, I need to see your data and MC histograms. Can you please send me and if possible, together with the macro producing the result ?
I try to be more specific. The 2 MC histograms are the output of a neural net in presence of protons and irons in the learning step. The data sample is an indipendent set, for which I would like to estimate the fractions of the 2 classes. I send you 2 ascii file (for MC and Data) and my macro.
In my macro I use normalized histograms in the fit procedure, but after your reply I noticed that if I use the original histograms I get reasonable errors.
(try to replace in the macro mc->Add(protonMC_norm);mc->Add(ironMC_norm);
TFractionFitter* fit = new TFractionFitter(DataSample_norm, mc); with ironMC,protonMC,DataSample).
Is this the solution of the problem? Still now I don’t understand very well why I cannot use normalized histograms in the fit. Any idea?
Thanks for your attention, cheers…Simone
DataSample.txt (68.9 KB)
MCSample.txt (111 KB)
FindAbundances.c (2.71 KB)
TFractionFitter performs a likelihood fit assumin Poisson statistics for the bin content, both for MC and data. Your histogram should therefore represent count events and should not be normalized.
This is the reason your errors are so large when you use normalize histograms.
I am trying to work with this class, but:
You cant put constrains in the normalization of the histograms as PDF’s. The only thing you seem to be able to do is to tell the computer, I want the normalization to go from a to b, which is obviously not enough.
You said that it takes the MC templates errors as Poison errors? The errors are definitely not supposed to be Poison errors in the MC. You have to take into account the PDF uncertainty, differences between generators, factorization and regularization scales, showering settings, pt scale, pt resolution …
How could just using the Poison errors be enough for a real analysis? I mean, is there a reason why you cant build a likelihood as the product of binomial pdf’s where you choose N and P in such a way that the error is the error that the user puts in the histogram that is fed to the class? I think its pretty straightforward to write something like that. However I think ROOT is here so that each group does not have to write its own tools to do everything.