I’m using pyroot and trying to visualise the progress of constructing RooDataSet from a ROOT file. Now I’ve got two methods of constructing RooDataSet:
method 1: use constructor
tree = TChain()
tree.Add("my tree")
branch = RooRealVar("branch","branch of tree",0)
data = RooDataSet("data","data", tree, branch)
method 2: manual
def branchToData(obs, tree, branchName=str()):
'''
branchName is a string of branch's name
Transform branch into data[obs]
'''
data = RooDataSet('data', 'data', RooArgSet(obs))
# set eventName to use eval() to get event
eventName = 'tree.' + branchName
# get total number of entries
num = tree.GetEntries()
# transform data
for i in range(0, num):
tree.GetEntry(i)
event = eval(eventName)
obs.setVal(event)
data.add(RooArgSet(obs))
return data
Absolutely, method 1 is much faster than method 2, but if I use method 1, it might be much more difficult for me to visualise progress.
So I wonder if I use method 2 in multi-threading way, could it be as fast as method 1? Or because of different internal mechanism, method 2 can never be as fast as method 1?
Just jumping in. The manual filling will be fast enough only if you SetBranchStatus("*",0) to everything ( i think) except the branches you actually need. ( GetEntry(i) is very painful since it will probably update ALL branches values under the hood in some memory location, and not read only the one you need)
To run this manual filling faster i would actually proceed differently for the benchmarking.
df = r.RDataFrame(tuple)
dfNumpy = df.Filter( cutString).AsNumpy( columns= [ neededcols] )
# or alternatively
columnForDS = df.Filter( cutString).Take( "branchesYouNeed")
for i in range( 0, columnForDS) #or the zip loop dfNumpy["colum"]
obs.setVal()
data.add(RooArgSet( obs))