RooDataSet is very slow : why?

Hello,

I have a conceptual problem of performance with RooDataSet.

Let’s consider that we have a huge TTree sample, I mean with plenty of variables.

I measure the performance of execution of two programs.

***The first program (1) does the following

a) unbranch all variables

tree->SetBranchStatus("*",0); //unbranch everything

b) branch only two variables

c) create a new TTree that stores only the two variables

–>This first program is extremely fast at execution stage : typically 20 seconds on lxplus
Please note that this program, by definition, has looped on all events of the big TTree.

***The second program (2) does the following

a) unbranch all variables

tree->SetBranchStatus("*",0); //unbranch everything

b) branch only the same two variables

c) create a RooDataSet from the big Tree and using a RooArgSet that contains only the two variables of interest.

–>the program is extremly slow : typically 35 minutes on lxplus

So, I have the solution to create a new TTree that has only the two variables, but somehow, this is not completely satisfactory, and not logic.
It looks like RooDataSet considers all variables of the TTree, for unknown reason.

My question is : why allocating a RooDataSet from a TTree, considers all variables since I explicitely put only two variables in the RooArgSet,
and I explicitely switch off almost all of the variables by the procedure in (a) and (b).

Would somebody has an explanation and/or solution ?

thank you for your help

(Just in case, even if it should be necessary in order to understand the problem above, here is a minimal example here :

/afs/cern.ch/work/e/escalier/public/RooDataSet_SlowWhy

)

Hi escalier,
sorry for the slow reply but the RooFit/RooStat expert (@moneta) is currently on vacation.
I will ask around, but I think your best bet is bumping this thread with a mention to him in a few days.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.