Speed up adding files with TChain

Hi all,

I am dealing with a large number of .root files using TChain::Add, and I want to speed up the reading procedure. Is there a way to achieving this (like multi-processing, but I don’t know how to realize it)?

By the way, I also want to speed up the CopyTree part when I want to extract a subset of the original TTree and keeps the tree structure. Any suggestions on this are also welcomed!

Thanks in advance!
Yi


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.22/02
Platform: Debian Linux
Compiler: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)


I think it’s already optimized, but maybe @pcanal or @eguiraud can comment on this

Hi @Georgetaoyi ,
you can use TThreadExecutor or TProcessExecutor to execute your logic in parallel threads/processes, or TTreeProcessorMT/TTreeProcessorMP specifically to process TTrees in parallel threads/processes.

A bit different, more high-level, RDataFrame is the suggested interface for parallel fast ROOT data processing.

CloneTree has a “fast” option to copy raw bytes directly rather than decompressing, deserializing, serializing again and compressing again.

As always, you should profile your application to make sure where the bottlenecks are, however.

Cheers,
Enrico

1 Like

@eguiraud Thanks for your suggestions! I will try to work around.

Actually, what I want to do is nothing special but extract a subset of entries of the original TTree under some cuts, and generate a new .root file, keeping the same tree structure. That’s it without any calculation steps. So I think the bottleneck would only be the reading and writing .root file or the copytree part that I want to speed up.

To be more specific, my core codes are as following:

// Read original tree.
TChain *ch = new TChain(tree_name);
ch->Add(root_chain_path);
// Create output file & Save new tree after cuts.
TFile *out_file = new TFile(ofname, "recreate");
TTree *copy_tree = ch->CopyTree(apply_cuts);
copy_tree->Write();
out_file->Close();
// Free space.
delete out_file;
delete ch;

where apply_cuts is a combination of some TCut.

Since I am not an expert on managing threads or processes, could you show me a small demo that I can follow as a start?

Best regards,
Yi

We have tutorials for all the features I mentioned, you can grep for the class names in the directory pointed by root-config --tutdir.

You can also open the tutorials as notebooks from here (RDataFrame) and here (TTreeProcessor{MP,MT} etc.) (unfortunately these latter tutorials are not super well labeled in terms of which of the classes are used inside, hence the suggestion to grep for what you need).

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.