TDataframe
figures out on it’s own how much threads it uses
No, by default it runs single-thread, if you specify ROOT::EnableImplicitMT()
it runs multi-thread using as many threads as ROOT::GetImplicitMTPoolSize()
(should be equal to the available amount of cores), but you can specify the size of the thread pool as ROOT::EnableImplicitMT(nThreads)
if you need to.
Does it also mean that the input file (for this test it’s only one) is read by about 10 threads?
Yes, each thread opens the file and processes a part of the entries
I think this is where my problem is: with xrootd, this parallel reading does not seem to be possible.
Ok, I must say I’m not overly familiar with xrootd, why do you think so?
Using the ACLiC compiled versions of the script tree_trimmerTDF.C (4.9 KB) XiczPi.txt (4.2 KB)
takes about 2 minutes with local input (the 30s were too optimistic) and 15 minutes with xrootd input
We (ROOT team) need to look into this. I think the data-frame implementation in current ROOT master (and upcoming v6.14) fixed a couple of issues that might affect your runtimes.
In order to do so:
- could you please provide a small reproducer that does not depend on IOJuggler?
- is the data you are reading publicly available? (I can try to reproduce the issue with some other dataset but it would be nice to reproduce in the same conditions as you)
Currently there are only 2, but there might be up to several hundred in future applications.
2 defines cannot be an issue. hundreds of them will be slow to create (not to run, just to create) in v6.12, but that’s fixed in master and v6.14.
I assume that putting them into a single lambda like it was done in the multiprocessing script is the better way?
It shouldn’t be noticeably faster.
I think they crashed, since the main process hangs waiting for their results.
I think the problem is still the initial one: reading a remote file with xrootd in multiple threads/processes.
Ok, I think I’m missing a step: why do you think this is the problem, i.e. where is the smoking gun?
The multi-process script, if I understand correctly, hangs indefinitely waiting for results from worker processes which have aready exited. The TDataFrame script is very slow with remote files but is also quite slow on local files, so there might be another issue.
In the case of TDataFrame, could you try calling SetCacheSize()
on the TChain
before you pass it to the TDataFrame
constructor? That should force the usage of a TTreeCache
, which is disabled by default because of a (recently fixed) bug. This should greatly reduce the number of remote reads, possibly speeding up the analysis.
It seems as if TDataFrame
can also only read with one thread.
Why do you say so?
I’m currently leaning towards the initial multi-processing script, and will provide an ACLiC-compilable version of this as well.
It would help a great deal if you could provide scripts that only depend on ROOT, not on other libraries that I would have to install/compile to try out the reproducers.
I’m currently leaning towards the initial multi-processing script, and will provide an ACLiC-compilable version of this as well.
We’ll make it work then as I suggested before you could put some printouts in the worker tasks to try to figure out when the workers exit and why.
Cheers,
Enrico