Difference between constructing RooDataSet by constructor or manually

mrli · December 17, 2020, 6:37am

Dear all,

I’m using pyroot and trying to visualise the progress of constructing RooDataSet from a ROOT file. Now I’ve got two methods of constructing RooDataSet:

method 1: use constructor

tree = TChain()
tree.Add("my tree")
branch = RooRealVar("branch","branch of tree",0)

data = RooDataSet("data","data", tree, branch)

method 2: manual

def branchToData(obs, tree, branchName=str()):

    '''

    branchName is a string of branch's name

    Transform branch into data[obs]

    '''

    data = RooDataSet('data', 'data', RooArgSet(obs))

    # set eventName to use eval() to get event

    eventName = 'tree.' + branchName

    # get total number of entries

    num = tree.GetEntries()

    # transform data

    for i in range(0, num):

        tree.GetEntry(i)

        event = eval(eventName)

        obs.setVal(event)

        data.add(RooArgSet(obs))

    return data

Absolutely, method 1 is much faster than method 2, but if I use method 1, it might be much more difficult for me to visualise progress.

So I wonder if I use method 2 in multi-threading way, could it be as fast as method 1? Or because of different internal mechanism, method 2 can never be as fast as method 1?

Thanks all !

ROOT Version: 6.22.02
Platform: Ubuntu 20.04

etejedor · December 17, 2020, 2:20pm

@moneta could you explain the difference in performance between the two methods pointed out by the user? Thanks!

RENATO_QUAGLIANI · December 17, 2020, 3:14pm

Just jumping in. The manual filling will be fast enough only if you SetBranchStatus("*",0) to everything ( i think) except the branches you actually need. ( GetEntry(i) is very painful since it will probably update ALL branches values under the hood in some memory location, and not read only the one you need)
To run this manual filling faster i would actually proceed differently for the benchmarking.

df = r.RDataFrame(tuple) 
dfNumpy = df.Filter( cutString).AsNumpy( columns= [ neededcols] ) 
# or alternatively 
columnForDS = df.Filter( cutString).Take( "branchesYouNeed")
for i in range( 0, columnForDS) #or the zip loop dfNumpy["colum"] 
    obs.setVal() 
    data.add(RooArgSet( obs))

Cheers
Renato

moneta · December 17, 2020, 3:38pm

Also Method1 is havin the even loop in c++ while method 2 in Python.
This makes a big difference!!

mrli · December 17, 2020, 4:56pm

Thanks for your advice!
But here comes a new problem…
I tried your codes but it threw me a mistake

I have a root file named ‘superGauss.root’ with a tree named ‘gauss’, which has 3 branches: gauss1, gauss2, gauss3. Then I tried to run:

df = RDataFrame('gauss', 'superGauss.root')
columnForDs = df.Take('gauss1')      # No need for cut, and I'm not familiar with cutString either...

then I got

Traceback (most recent call last):
  File "test.py", line 63, in branchToData
    columnForDS = df.Take('gauss1')
TypeError: Template method resolution failed:
  Failed to instantiate "Take(std::string)"

I’m not sure if I’ve made any mistake ?
Thanks!

etejedor · December 17, 2020, 5:48pm

Please try specifying the type of the branch you are taking, for example assuming gauss1is of type float:

columnForDs = df.Take['float']('gauss1')

system · December 31, 2020, 5:48pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.