Using RooFit with weighted data

Dear experts,

I came up with some complications lately trying to use RooFit with weighted distributions.

I have a data distribution together with a signal and a background template and I’m trying to fit this data distribution with these two templates in order to estimate the fraction of signal objects in my data distribution.


RooDataHist faketemplate(“faketemplate”,“fake template”,sinin,h1);
RooHistPdf fakepdf(“fakepdf”,“test hist fake pdf”,sinin,faketemplate);

RooDataHist realtemplate(“realtemplate”,“real template”,sinin,h2);
RooHistPdf realpdf(“realpdf”,“test hist real pdf”,sinin,realtemplate);

RooDataHist data(“data”,“data to be fitted to”,sinin,hData);

RooRealVar fsig(“fsig”,“signal fraction”,0.1,0,1);

RooRealVar signum(“signum”,“signum”,0,ndataentries);
RooRealVar fakenum(“fakenum”,“fakenum”,0,ndataentries);

RooExtendPdf extpdfsig(“Signal”,“extpdfsig”,realpdf,signum,“sigrange”);
RooExtendPdf extpdffake(“Background”,“extpdffake”,fakepdf,fakenum,“sigrange”);

RooAddPdf model(“model”,“sig + background”,RooArgList(extpdfsig,extpdffake));

model.fitTo(data,RooFit::Minos(),SumW2Error(kTRUE),PrintEvalErrors(-1));


Usually, I run on real data distributions and there I don’t need the “SumW2Error(kTRUE)” and all works fine. But now, I’m performing a closure test and my “data” distribution as well as my fake templates are extracted from MC. It is a combination of MC fake samples to kind of represent what we would observe in real data and, of course, each entry in this distribution receives a weight proportional to the cross section of the sample it is from.

However, it seems that one cannot use SumW2Error(kTRUE) for Minos. Could anyone advise me what I could use in place ? The thing is I did all my studies with Minos so if I change to something else for my closure test, things might be difficult to compare.

Thanks

Hi,

This is correct. You cannot use Minos with weighted events, but the method based on the inverse Hessian.
Anyway, this is just an approximation if you don;t know the really distribution of the weights. If you know that distribution, you should use it and model it in your likelihood.

Lorenzo

Hi Lorenzo,

thanks for the reply. Could you please indicate me how to use the inverse hessian ? Also, by distribution of weights, you probably mean that for each entry in my data distribution, I save the associated weight to another histogram and that final histogram would be the distribution of weights, do we agree on the definition ? I can have that. If I have that, could you please show me how to use it ?

Thanks

Otman

Hi,

The inverse Hessian method is used by default in the error returned in RooFit when you use SumW2(True) in fitTo.

For using the distribution of the weights, you need their PDF and in particular how they are correlated with the other observables. Then you build a global PDF(x, weight| parameter ) to model your data. So the weight is just as another observable. This is maybe feasible using your if the weight distribution does not depend on the parameters.

Lorenzo

Hi Lorenzo,

unfortunately, I still get an issue. It looks like the fractions of real and fake in my “data” distribution are exactly the same. I’m posting below the output from RooFit. I’m using now the following fitTo command:

        model.fitTo(data,SumW2Error(kTRUE),PrintEvalErrors(-1));

Thanks a lot.

Otman

[#1] INFO:Eval – RooRealVar::setRange(sinin) new range named ‘sigrange’ created with bounds [0.005,0.0105]
[#1] INFO:DataHandling – RooDataHist::adjustBinning(realtemplate): fit range of variable sinin expanded to nearest bin boundaries: [0,0.03] --> [0,0.03]
[#1] INFO:DataHandling – RooDataHist::adjustBinning(data): fit range of variable sinin expanded to nearest bin boundaries: [0,0.03] --> [0,0.03]
[#1] INFO:Minization – p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization – RooMinuit::optimizeConst: activating const optimization
[#1] INFO:Minization – The following expressions have been identified as constant and will be precalculated and cached: (realpdf,fakepdf)
[#1] INFO:Minization – The following expressions will be evaluated in cache-and-track mode: (Signal,Background)


** 13 **MIGRAD 1000 1


FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
START MIGRAD MINIMIZATION. STRATEGY 1. CONVERGENCE WHEN EDM .LT. 1.00e-03
FCN=-2.47271e+09 FROM MIGRAD STATUS=INITIATE 8 CALLS 9 TOTAL
EDM= unknown STRATEGY= 1 NO ERROR MATRIX
EXT PARAMETER CURRENT GUESS STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 fakenum 1.14168e+06 2.28335e+05 2.01358e-01 -6.65777e+07
2 signum 1.14168e+06 2.28335e+05 2.01358e-01 -5.18939e+07
ERR DEF= 0.5
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-2.55384e+09 FROM MIGRAD STATUS=CONVERGED 42 CALLS 43 TOTAL
EDM=4.36225e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 fakenum 2.28335e+06 1.76947e-02 6.14395e-03** at limit **
2 signum 2.28335e+06 2.25060e-02 6.92906e-03** at limit **
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
2.404e-09 -8.181e-15
-8.181e-15 4.949e-09
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00000 1.000 -0.000
2 0.00000 -0.000 1.000


** 18 **HESSE 1000


COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-2.55384e+09 FROM HESSE STATUS=OK 10 CALLS 53 TOTAL
EDM=4.36183e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 fakenum 2.28335e+06 1.76947e-02 1.22879e-03 1.57080e+00
WARNING - - ABOVE PARAMETER IS AT LIMIT.
2 signum 2.28335e+06 2.25060e-02 1.38581e-03 1.57080e+00
WARNING - - ABOVE PARAMETER IS AT LIMIT.
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
2.404e-09 -3.377e-16
-3.377e-16 4.949e-09
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00000 1.000 -0.000
2 0.00000 -0.000 1.000
[#1] INFO:Fitting – RooAbsPdf::fitTo(model) Calculating sum-of-weights-squared correction matrix for covariance matrix


** 23 **HESSE 1000


COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-1.13913e+13 FROM HESSE STATUS=OK 14 CALLS 67 TOTAL
EDM=0.0186801 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 fakenum 2.28335e+06 3.14484e-06 2.50673e-02 1.57080e+00
WARNING - - ABOVE PARAMETER IS AT LIMIT.
2 signum 2.28335e+06 6.97747e-06 2.82706e-02 1.57080e+00
WARNING - - ABOVE PARAMETER IS AT LIMIT.
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
4.273e-13 -2.569e-17
-2.569e-17 1.534e-12
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00003 1.000 -0.000
2 0.00003 -0.000 1.000
[#1] INFO:Minization – RooMinuit::optimizeConst: deactivating const optimization
[#0] ERROR:Eval – RooChi2Var::RooChi2Var(chi2_lowstat) INFINITY ERROR: bin 49 has zero error
my chi2 is 0
[#1] INFO:InputArguments – RooAbsData::plotOn(data) INFO: dataset has non-integer weights, auto-selecting SumW2 errors instead of Poisson errors
[#1] INFO:Plotting – RooAbsPdf::plotOn(model) directly selected PDF components: (Signal)
[#1] INFO:Plotting – RooAbsPdf::plotOn(model) indirectly selected PDF components: (realpdf)
[#1] INFO:Plotting – RooAbsPdf::plotOn(model) directly selected PDF components: (Background)
[#1] INFO:Plotting – RooAbsPdf::plotOn(model) indirectly selected PDF components: (fakepdf)

Hi,

I would need your full running code to understand this problem, please post it.
Lorenzo

Hi Lorenzo,

sure, this is the code attached.

Thanks again

Otman
myFakeRateMacroJet.C (18.6 KB)

Thanks for the file. I am away right now, but I’ll look as soon as I can
Best Regards

Lorenzo

Hi Lorenzo,

could you tell me if there is anything new with regard to this ?

Thanks a lot.

Otman

Dear all,

I’m trying to create a new RooDataSet which has several weights, I tried something like:

RooDataSet *ds = new RooDataSet(Form("%s_RooDataSet", obsname.c_str()), “dataset”,
objets, RooFit::Import(*tcut),
RooFit::Cut(cuts.c_str()),
RooFit::WeightVar(v_weights[1].c_str()),RooFit::WeightVar(v_weights[0].c_str()));

but it seems that it is using only the last weight that I’m providing… could someone tell me how to give the 2 weights to the RooDataSet? Thanks a lot!