RDataFrame not parallelizing when reading from TFile


I’m having problem with RDataFrame not parallelizing my code:

TFile *file = new TFile("newfile.root", "READ");
auto df = ROOT::RDataFrame("myNewTree", file);
// auto df = ROOT::RDataFrame(1000);

auto work = []() {
  int dummyVar = 0;
  for(int i = 0; i < 10000000; i++)
      dummyVar = TMath::Sin(i);
  return 0.5;

auto newDf = df.Define("work", work);
auto hist  = newDf.Histo1D<double>("work");

When I switch from reading df from TFile but create an empty one with 1000 lines (commented line), all my cores are engaged and the code finishes pretty fast. However, reading it from the file results in serial execution.

I have tried to load the TTree into RDataFrame, save it’s snaphot and read the saved snapshot; it has the same issue. My TTree which I am trying to load has 1k entries, 2 branches of vector with ~100 elements in each entry. The work function is just a debug pleaceholder, my “real” code needs to work with the loaded data.

Could someone point me in the right direction how to run the code in parallel when reading from file?

Thank you

$ root -b -q
  | Welcome to ROOT 6.26/10                        https://root.cern |
  | (c) 1995-2021, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Nov 16 2022, 10:42:54                 |
  | From tags/v6-26-10@v6-26-10                                      |
  | With c++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0                 |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'       |
$cat /etc/os-release | head -n 1
PRETTY_NAME="Ubuntu 22.04.2 LTS"

Maybe @eguiraud or @vpadulan can help

Hi @Martin1512 ,

when processing a TTree or a TChain the smaller granularity of a multi-thread task is that of a TTree cluster. That is, there cannot be more multi-thread tasks than the number of clusters in your file. The rationale for this has to do with how ROOT I/O works.

With only 1k entries I suspect the problem might be that you have only 1 or 2 clusters in the file and therefore at most 1 or 2 threads have something to do.

You can check what’s going on by activating RDF logs. The logs format has been improved in recent versions so if you can try with ROOT v6.28.04 you’ll get better output, but I think ROOT v6.26.10 should also provide useful information.


Thank you for your help @bellenot and @eguiraud.

@eguiraud: I think the granularity is the problem as you say. When I activate the logs, for TFile reading I get:

Info in <[ROOT.RDF] Info $ROOT_PATH/tree/dataframe/src/RLoopManager.cxx:505 in ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::<lambda(TTreeReader&)>>: Processing trees {events} in files {extended_histograms.root}: entry range [0,999], using slot 47 in thread 139923041806912.

I have 48 core CPU, so using slot 47 seems plausible. When I run the parallel code with an empty RDataFrame all slots 0-47 are used.

I have used Cache() on the loaded RDaraFrame and it is working in parallel now:

auto df = ROOT::RDataFrame("myNewTree", file).Cache();

Is there a way to find out the number of clusters in the TFile? When I will have larger data in the future, I might not want to load everything in the memory. It is also true that large data will be more likely to have more clusters. However, I would like my code to be more flexible for both cases, large and small data samples.

Hi @Martin1512 ,

larger datasets will certainly have more clusters. By default TTree tries to make each cluster of entries roughly 30MB in size when compressed. That is a good granularity for a parallel task that needs to perform I/O.

tree->Print("clusters") is one way to check the clustering of a TTree. There is also tree->GetClusterIterator(0) for a more manual way to iterate over cluster boundaries.


Thank you @eguiraud that was very helpful!


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.