How to speed up RooAbsPdf::createNLL procedure?

when running toy MC and evaluating NLL of the result,
I see that my process is slowing down as more and more toy data is processed, like:
calculateCL-> iteration 0 taken time 0s
calculateCL-> iteration 1000 taken time 19s
calculateCL-> iteration 2000 taken time 33s
calculateCL-> iteration 3000 taken time 45s
calculateCL-> iteration 4000 taken time 58s
calculateCL-> iteration 5000 taken time 69s
calculateCL-> iteration 6000 taken time 96s
calculateCL-> iteration 7000 taken time 107s
calculateCL-> iteration 8000 taken time 107s
calculateCL-> iteration 9000 taken time 116s
The time here is the one taken by every next 1000 iterations, so after 6K iterations the speed is
factor of 5 slower than initially, continuing to degrade further.

The code does iterations like the following:
RooDataSet* toyData = mModel.generate (mBX, Extended());
RooAbsReal* absReal = mModel.createNLL (*toyData, Extended());
double nllSB = absReal->getVal();
delete absReal;
delete toyData;

The program stack taken at arbitrary moments looks like the one shown below.
I conclude that it makes a lot of work for cloning my PDF (expansive for my non-trivial custom PDF),
cloning dataset, and defining all this stuff in cache manager.

However, the task is simple: use PDF to generate dataset, get NLL for this dataset, delete dataset.
I guess no internal cloning is technically necessary here at all. Probably I could even re-use dataset
object by resetting it after every toy experiment, rather than creating it every time
from the scratch?

I can imagine that this very CPU consuming machinery is necessary to generalize
complicated cases. As far as I know that my case is simple, is it possible to bypass all those machinery,
and shortcut the NLL calculation.
Is there a way to do it and make time consumed by this cycle "generate data-calculate NLL-drop data"
comparable with time required by corresponding “generate” and “evaluate” calls?


#0 0x00002ae2ed779af0 in TObject::operator delete(void*) () from /afs/
#1 0x00002ae2f184b2a3 in RooHashTable::~RooHashTable() ()
from /afs/
#2 0x00002ae2f188b179 in RooNormSetCache::expand() () from /afs/
#3 0x00002ae2f188b591 in RooNormSetCache::initialize(RooNormSetCache const&) ()
from /afs/
#4 0x00002ae2f18982a3 in RooCacheManager::RooCacheManager(RooCacheManager const&, RooAbsArg*) ()
from /afs/
#5 0x00002ae2f1897c19 in RooObjCacheManager::RooObjCacheManager(RooObjCacheManager const&, RooAbsArg*) ()
from /afs/
#6 0x00002ae2f17c5efd in RooAddPdf::RooAddPdf(RooAddPdf const&, char const*) ()
from /afs/
#7 0x00002ae2f17c7e34 in RooAddPdf::clone(char const*) const ()
from /afs/
#8 0x00002ae2f177706f in RooAbsCollection::snapshot(RooAbsCollection&, bool) const ()
from /afs/
#9 0x00002ae2f17774e3 in RooAbsCollection::snapshot(bool) const ()
from /afs/
#10 0x00002ae2f178bb8e in RooAbsOptTestStatistic::RooAbsOptTestStatistic(char const*, char const*, RooAbsReal&, RooAbsData&, RooArgSet const&, char const*, char const*, int, bool, bool, bool, bool) ()
from /afs/
#11 0x00002ae2f1888d49 in RooNLLVar::RooNLLVar(char const*, char const*, RooAbsPdf&, RooAbsData&, RooArgSet const&, bool, char const*, char const*, int, bool, bool, bool, bool) () from /afs/
#12 0x00002ae2f1794cfe in RooAbsPdf::createNLL(RooAbsData&, RooLinkedList const&) ()
from /afs/
#13 0x00002ae2f1790430 in RooAbsPdf::createNLL(RooAbsData&, RooCmdArg, RooCmdArg, RooCmdArg, RooCmdArg, RooCmdArg, RooCmdArg, RooCmdArg, RooCmdArg) () from /afs/

After I have implemented shortcut methods like
quickNLL (const RooDataSet& fData)
quickGenerate ()
which use my model parameters, but address “evaluate” and “generate” methods
of my custom PDFs directly, the “generate toy dataset - calculate NLL” loop
did speed up by factor of 1000.

So, I guess there is a huge window for improvement of RooFit performance on simple models.