Frankly, the 24 minutes or so that it takes to go over 370 GB is not bad at all even if it’s sub-optimal - there will always be a bottleneck.
I make this a bigger issue for myself though by using EnableImplicitMT() in conjunction with python multiprocessing (as I posted about here). I imagine this just exacerbates the issue because now the disk is jumping between many different files. I didn’t think this would be such a big deal but I ran four worker nodes with four threads each overnight for about 20 different sets of files and woke up to it pretty much frozen (which is what brought me here).
It makes no sense to me that read speed doubles with
EnableImplicitMT(1)
w.r.t. no multi-threading.
I think this is just inconsistency from iotop. I think it’s the difference between the measurements being taken at the edges of a transfer and splitting it in the middle. And then there’s human error because I’m just staring watching for the highest number and maybe I miss when the ping gets lucky and sees a 300kB chunk in 1 second (these studies were not terribly scientific).