Dear @ytchoutw,
I am going to assume that in your case you are trying to process remote files via xrootd (e.g. by passing a list of filenames with the root://
prefix when creating the RDataFrame). In that case, most probably what you are experiencing is a delay in creating the tasks to be sent to Dask. In 6.26, RDataFrame needs to open all files to get the number of entries so that it can properly split the input dataset in different tasks. Opening a remote file is sometimes a costly operation (and of course scales linearly with the number of files you are processing). This limitation is removed in 6.28 (next ROOT release), so that the files are only opened once the tasks arrive on the distributed nodes. Your client application will create tasks without opening files.
I am taking note of your suggestion about fire_and_forget
. I will look into a possible implementation. Your current problem is solved in the next ROOT release.
Cheers,
Vincenzo