TDataFrame and EnableImplicitMT Slow

ellkay · October 15, 2019, 9:55am

Hi,

I have been trying to set up an RDataFrame to process a chain containing many events efficiently.
I am using python with ROOT version 6.14.08 (6.14.08-x86_64-centos7-gcc8-opt)

I define my data frame:
origDF = R.RDataFrame(tree)
/tdfnraw = origDF.Count().GetValue()

and then define a series of columns based on the different variables I need with various cuts
def ApplyDefines( df, names, expressions, weights ):
for i in range(0, len(names)):
if names[i] not in df.GetColumnNames():
# print names[i]
df = df.Define( names[i], expressions[i] ).Define( “histweight”+str(i), weights[i] )
else:
df = df.Define( “histweight”+str(i), weights[i] )
return df

TDF = ApplyDefines( origDF, varnames, plotvars, weights )

I then loop through a list of desired histograms, each with different cuts applied and filter TDF:
filtered[cut] = TDF.Filter(cut)

before filling histograms using lines like : filt.Histo1D( histogram definitions, variable, weight)

Since my input chain is rather large, this process was quite slow. As a result, I tried to use ROOT.ROOT.EnableImplicitMT(2) before my dataframe definition. After applying this, the time taken doubled! It seems to take longer the more threads I request.

Do you know what could be causing this?

bellenot · October 15, 2019, 10:05am

Did you try to search the forum? I found this: TDataFrame and EnableImplicitMT that could be related, maybe?

ellkay · October 16, 2019, 1:42pm

Yes I did, and the thread does not end in a solution to the MT speed problem.
I am also using a different (newer) version of ROOT

bellenot · October 16, 2019, 1:43pm

So maybe @eguiraud has an idea…

eguiraud · October 16, 2019, 2:23pm

Hi,
your logic as explained above looks good, except for that initial GetValue() which has to trigger the event loop, maybe prematurely. But anyway that does not explain the time increase.

Can you provide a way for us to reproduce the issue? That would be the fastest way to understand what’s going on. Alternatively, I would suggest to start small and add pieces until the issue presents itself: does just this run slower with more threads?

origDF = R.RDataFrame(tree)
origDF.Count().GetValue()

Cheers,
Enrico

system · October 30, 2019, 2:23pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.