Hello,

I have a conceptual problem of performance with RooDataSet.

Let’s consider that we have a huge TTree sample, I mean with *plenty of variables*.

I measure the performance of execution of two programs.

***The first program (1) does the following

a) unbranch all variables

tree->SetBranchStatus("*",0); //unbranch everything

b) branch only two variables

c) create a new TTree that stores only the two variables

–>This first program is extremely fast at execution stage : typically 20 seconds on lxplus

Please note that this program, by definition, has looped on all events of the big TTree.

***The second program (2) does the following

a) unbranch all variables

tree->SetBranchStatus("*",0); //unbranch everything

b) branch only the same two variables

c) create a RooDataSet from the big Tree and using a RooArgSet that contains only the two variables of interest.

–>the program is extremly slow : typically 35 minutes on lxplus

So, I have the solution to create a new TTree that has only the two variables, but somehow, this is not completely satisfactory, and not logic.

It looks like RooDataSet considers all variables of the TTree, for unknown reason.

My question is : why allocating a RooDataSet from a TTree, considers all variables since I explicitely put only two variables in the RooArgSet,

and I explicitely switch off almost all of the variables by the procedure in (a) and (b).

Would somebody has an explanation and/or solution ?

thank you for your help

(Just in case, even if it should be necessary in order to understand the problem above, here is a minimal example here :

/afs/cern.ch/work/e/escalier/public/RooDataSet_SlowWhy

)