Why generating four Poisson numbers is so slow

I have a very simple workspace, made by the join of four poisson distribution

w = ROOT.RooWorkspace()
w.factory("RooPoisson:p1(obs1[0, 1000], 10)")
w.factory("RooPoisson:p2(obs2[0, 1000], 20)")
w.factory("RooPoisson:p3(obs3[0, 1000], 30)")
w.factory("RooPoisson:p4(obs4[0, 1000], 40)")
model = w.factory("PROD:model(p1, p2, p3, p4)")
all_obs = w.allVars().selectByName('obs*')

NTOYS = 1000
for itoy in range(NTOYS):
    data = model.generate(all_obs, 1)

this took 24 seconds on my laptop, while

model.generate(all_obs, 1000)

10 loops, best of 3: 27.4 ms per loop

from scipy import stats
stats.poisson([10, 20, 30, 40]).rvs((1000, 4))

1000 loops, best of 3: 1.4 ms per loop

So: first version = 1x, second version = 870x, scipy version = 17000x

Am I doing something wrong? I would prefer the version model.generate(all_obs, 1) since I want to model a counting experiment. By the way there is a 20x between the vectorized version of RooFit and scipy.

I think it is slow because you are using C style loop,
try to have something like

data = [model.generate(all_obs, 1) for itoy in xrange(NTOYS) ]

I get exactly the same timing. In fact list comprehension should not be very different for explicit for-loops. I guess there is an overhead for some subtask done by generate which is repeated all the times.

I am wondering if it is possible to split the generate method in the part that can be done just one time and the real random generation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.