How the ProjWData is accelerated/optimized for RooDecay?


I am just wondering how the ProjWData option is accelerated/optimized for RooDecay? I just did a simple test with my own Exp*Gaus Pdf, the calculation of the convolution is simplified to the calculation of the Error function (Tmath::Erf). I thought this is a form as “analytical” as possible. I used RooClassFactory::makePdf to generate the Pdf, the evaluate part is in the following:

Double_t RooMyGE::evaluate() const { // ENTER EXPRESSION IN TERMS OF VARIABLE ARGUMENTS HERE return 1./(2.*tau) * TMath::Exp(stSF*stSF*st*st/(2.*tau*tau)-t/tau)* (1-TMath::Erf(stSF*st/(TMath::Sqrt(2)*tau)-t/(TMath::Sqrt(2)*stSF*st))) ; }

Its fitting speed is slower than RooDecay. We still can afford it (for a more complicated Pdf). But the plotOn with the ProjWData option is very slow. Could you please give me some hints (or point me to the relevant source code) how it is accelerate/optimized, so that I can learn how to optimize a more complicated Pdf?

Another problem is that the NumCPU option in the fitting doesn’t help at all for my own Pdf. Using more CPUs even slow down the fitting process. I guess I also need add something?

The macros used to do the comparison is attached.
(RooDeay: useRooDecay.C;
my own Pdf: useMyGE.C)

Thanks a lot.

Cheers, Jibo
plotSpeed.tar.gz (216 KB)

Hi Jibo,

What version of ROOT are you using?

The option to accelerate ProjWData() with
multi-processor use was only introduced in ROOT 5.22. To use it one

condPdf.plotOn(frame1 , ProjWData(*st, *data), NumCPU(4)) ;

which works when I try your macro. (In fitting it NumCPU) works in earlier versions

Concerning the difference in speed between your p.d.f and RooDecay: RooDecay
also implements an analytical normalization integral, so that saves some time in the calculation of the normalization of the p.d.f w.r.t your class.

If you know the analytical integral you can implement that too: there is an option in RooClassFactory to generate the skeleton code that supports advertisement of analytical integrals as well.

I’m not sure what happens when you say that NumCPU doesn’t help for your p.d.f.
Can you post some numbers (CPU time vs NCPU). In general, when the fit is very fast, scaling with NCPU is poor because of a constant overhead in inter-process communication. But my impression was that your fit is not so fast, hence my confusion.