What is RooNLLVar calculating with a RooSimultaneous model?

will_cern · February 4, 2018, 1:03pm

Hello,

I’ve been try to check my understanding of the RooNLLVar test statistic variable in RooFit. In particular, I thought that I would be able to ‘emulate’ the result of RooNLLVar with some code like this:

double nll(0);
  for(int i=0;i<data.numEntries();i++) {
    obs = *data.get(i); //sets the observables to the value of the ith entry
    nll -= model.getLogVal(&obs); //evaluates log of pdf value
  }
  nll += model.extendedTerm(data.numEntries(),&obs); //for extended models

However, I tried doing this with a RooSimultaneous-based model I was working with, and I get a different result between this ‘by hand’ method and what I get back from a RooNLLVar object.

I put together a SWAN notebook to demonstrate this difference …

https://cernbox.cern.ch/index.php/s/2SkQwbv9CkHJHy4

Can someone (possibly @moneta ?) help me understand what I’ve apparently missed in the calculation?

Thanks
Will

moneta · February 5, 2018, 9:29am

Hi,

In case of RooSimultaneus pdf there are some extra constants applied, see for example

github.com

root-project/root/blob/master/roofit/roofitcore/src/RooNLLVar.cxx#L385


    } else {
	Double_t y = pdfClone->extendedTerm(_dataClone->sumEntries(), _dataClone->get()) - carry;
	Double_t t = result + y;
	carry = (t - result) - y;
	result = t;
    }
  }
}




// If part of simultaneous PDF normalize probability over
// number of simultaneous PDFs: -sum(log(p/n)) = -sum(log(p)) + N*log(n)
if (_simCount>1) {
  Double_t y = sumWeight*log(1.0*_simCount) - carry;
  Double_t t = result + y;
  carry = (t - result) - y;
  result = t;
}


//timer.Stop() ;
//cout << "RooNLLVar::evalPart(" << GetName() << ") SET=" << _setNum << " first=" << firstEvent << ", last=" << lastEvent << ", step=" << stepSize << ") result = " << result << " CPU = " << timer.CpuTime() << endl ;

I think it is difficult to reproduce exactly the same result in term of absolute value. What you should check if that the difference in Delta Log L (for two different parameter values) are the same. The absolute value does not count. There can always be some constant therm stripped or added

Best Regards

Lorenzo

will_cern · February 5, 2018, 2:54pm

Hi Lorenzo,

I spotted these extra terms in the code too, and really I should been a bit more up front about why I was trying to understand this. I was developing a goodness-of-fit test using the Baker-Cousins likelihood ratio. If I compute this test statistic by hand, I get a chi2-distribution, but I was hoping I could keep my code clean and use RooNLLVar to compute the numerator. But if there are these extra terms, then I will lose my chi2 distribution.

Anyway, I modified my notebook to add this extra term and that ends up giving me agreement. So glad this is understood. Why did RooSimultaneous do this???

Will

system · February 19, 2018, 2:54pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.