RDataFrame Display fast feature request?

cxwx · September 5, 2019, 7:10am

RDataFrame::Display is return a RDisplay, which is very slow when deal with huge dataset.
even to display only headset 10 events

I’d love to add a feature to dump the first 10 or several events fast in RDataFrame;
or is there already have a function to do it?

eguiraud · September 5, 2019, 9:41am

Hi @cxwx,
RDisplay is meant to be that feature!

Could it be that in your computation graph you have a Display action together with some other action like Histo1D that requires processing the whole dataset (so Display stops processing after 10 events but you only see the printout when the full event loop is finished)?

Otherwise, could you share a reproducer or run perf record --call-graph dwarf on the reproducer to produce a flamegraph or similar, to figure out where time is being spent?

Cheers,
Enrico

cxwx · September 5, 2019, 9:47am

Thanks,
I’m sorry, it was a mistake from me.

I made a mistake that I define two branch with same branch name, which cause the problem.

eguiraud · September 5, 2019, 9:49am

Ah, interesting. We should have diagnostics for that. Feel free to report an issue on jira if you think that’s a bug in RDF.

Cheers,
Enrico

system · September 19, 2019, 9:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.