Could it be that in your computation graph you have a Display action together with some other action like Histo1D that requires processing the whole dataset (so Display stops processing after 10 events but you only see the printout when the full event loop is finished)?
Otherwise, could you share a reproducer or run perf record --call-graph dwarf on the reproducer to produce a flamegraph or similar, to figure out where time is being spent?