Continuing the discussion from EnableImplicitMT() prevents reading XrootD file with RDataFrame:
This issue was only partially resolved. It still occurs for a TTree inside a TDirectory. Reproducer:
import ROOT as r
r.ROOT.EnableImplicitMT()
fpath = "root://path/to/test.root"
r.RDataFrame(10).Define("e", "rdfentry_").Snapshot("testd/testt", fpath)
opts = r.RDF.RSnapshotOptions()
opts.fMode = "UPDATE"
r.RDataFrame(10).Define("e", "rdfentry_").Snapshot("testt", fpath, "", opts)
f = r.TFile.Open(fpath)
t = f.Get("testt")
td = f.Get("testd/testt")
rdft = r.RDataFrame(t)
rdftd = r.RDataFrame(td)
ht = rdft.Histo1D("e")
htd = rdftd.Histo1D("e")
ht.GetMean() # works fine
htd.GetMean() # produces error
Output:
---------------------------------------------------------------------------
runtime_error Traceback (most recent call last)
Input In [3], in <module>
15 htd = rdftd.Histo1D("e")
16 ht.GetMean() # works fine
---> 17 htd.GetMean()
runtime_error: TH1D& ROOT::RDF::RResultPtr<TH1D>::operator*() =>
runtime_error: TTreeProcessorMT::Process: an error occurred while getting tree "//path/to/test.root:/testd/testt" from file "root://path/to/test.root"
ROOT Version: 6.24/06
Platform: Not Provided
Compiler: Not Provided
RDataFrame with a tree inside a directory in files used to populate a TChain seems to be working for me, but all I’ve tried so far is to load the RDataFrame and then run a ForEachSlot with a custom lambda.
But I don’t see a huge amount of MT activity so I wonder if the TChain source for the RDataFrame is limiting the MT aspect…
Hi @mwilkins ,
thank you for the report, I was not aware of this problem. This is now [DF] EnableImplicitMT() prevents reading TTree in sub-directory from XrootD file · Issue #10216 · root-project/root · GitHub , I will try to work on a fix in time for 6.26.02, which should be out in O(week).
Cheers,
Enrico
Thank you @danj1011 , you really need the combination of root://, EnableImplicitMT() and tree in subdirectory for this problem to occur, maybe you are missing the filename starting with root://?
No, TChain won’t limit the multi-thread scaling. It might be a few things, three that I can think of off the top of my mind:
- RDataFrame parallelizes over “clusters” of entries (visible e.g. with
tree->Print("clusters"), which is the unit of compression/decompression in TTree. If, for example, the TChain is composed of 10 trees and each tree is small and only has 2 clusters (for a total of 20 units of parallelization), you won’t see good scaling above a few cores (2-4) – generally we try to give each core 4 to 10 clusters to work with to have good workload balance
- I/O bandwidth acts as a bottleneck, so all threads spend a lot of their time waiting for data to arrive
- if you are using many threads (>100) and total runtimes of your application are in the order of seconds, up until v6.26.00 you might spend most of your time in some initialization overhead – that we now removed in master and we’ll try to backport those performance improvements to v6.26.02 as well
Let me know (in another thread) if you think you should get better CPU usage than you are getting – if you have a reproducer we can take a look on our side.
Cheers,
Enrico
As a workaround you can pass treename and filename directly, e.g.:
auto rdftd = ROOT::RDataFrame("testd/testt", "root://eosuser.cern.ch//eos/user/e/eguiraud/scratch/test.root");
This should work.
This patch should fix the problem, please try it out if you can (or try a ROOT nightly build in a couple of days) and let me know if you see further problems.
Cheers,
Enrico
Unfortunately, I don’t have a good way to build ROOT from source right now (and the link to the nightlies in your post does not seem to be working).
Thanks @eguiraud ! I had root://, EnableImplicitMT() and a tree in a subdirectory, but I’m using 6.24.06 so I don’t know why I don’t see any problems.
Great that the TChain shouldn’t interfere. Indeed, the bottleneck was xrootd access to remote sites. I did see a huge improvement going from 100 threads TO 1 thread (!) [i.e. 6 minutes → 2 seconds] so perhaps this initialization overhead is manifested here. I’ll be pleased to try out 6.26.00 when it’s available through the mainstream LCG views.
That’s surprising, that case was really broken 
On one hand, if one thread takes 2 seconds to process your data, you really shouldn’t throw 100 threads at it, they will not have enough “meat” to divide between themselves. On the other hand, that slowdown is terrible. You should see some overhead, not that. I hope the situation will be a lot better with v6.26.02 (a patch release that is coming out in O(week), which will include some scaling improvements. But v6.26.00 should already behave better than v6.24.06). If not, please let us know 
Cheers,
Enrico
Yes, I’m puzzled!
As for the threads, I agree, but I came at it from the opposite direction (100 threads, then tried 1
). When the LCG view is released I’ll give it a go. Thanks!
I would like to understand this better, but let’s stop hijacking this thread (sorry @mwilkins !). I’ll send you a few questions in a private message on the forum.
Yes, sorry @mwilkins (though, in my defence, @mwilkins did ask me offline to chime in here
)