Hi,
I have been trying to set up an RDataFrame to process a chain containing many events efficiently.
I am using python with ROOT version 6.14.08 (6.14.08-x86_64-centos7-gcc8-opt)
I define my data frame:
origDF = R.RDataFrame(tree)
/tdfnraw = origDF.Count().GetValue()
and then define a series of columns based on the different variables I need with various cuts
def ApplyDefines( df, names, expressions, weights ):
for i in range(0, len(names)):
if names[i] not in df.GetColumnNames():
# print names[i]
df = df.Define( names[i], expressions[i] ).Define( “histweight”+str(i), weights[i] )
else:
df = df.Define( “histweight”+str(i), weights[i] )
return df
TDF = ApplyDefines( origDF, varnames, plotvars, weights )
I then loop through a list of desired histograms, each with different cuts applied and filter TDF:
filtered[cut] = TDF.Filter(cut)
before filling histograms using lines like : filt.Histo1D( histogram definitions, variable, weight)
Since my input chain is rather large, this process was quite slow. As a result, I tried to use ROOT.ROOT.EnableImplicitMT(2) before my dataframe definition. After applying this, the time taken doubled! It seems to take longer the more threads I request.
Do you know what could be causing this?