Write TTrees into chunks using RDataframe

Dear Experts,

I would like to write several files into ~50 GB chunks instead of one single file for ease of reading. I believe by default, if a TTree is larger than 100GB (the default max tree size) a new file is created with “*_n.root” in its name. I am using pyroot and RDataFrame but I am having an issue which I’m not sure it is reproducible. When using RDataFrame to snapshot and it hits the max tree size, it produces a file called “original_filename_1.root” but it also deletes the original file? Is this normal behaviour?

I wasn’t sure if this was an error as it was a grid job (hence not easy to reproduce).

Thanks in advance!
__
_
Please read tips for efficient and successful posting and posting code

_ROOT Version:6.22.02
Platform: Not Provided
Compiler: Not Provided


Hi @jcob,
no that’s not expected. On the other hand, there are known issues with multi-thread Snapshots (or better the interface they use internally, TBufferMerger) and the file-switching behavior of TTree. As a consequence, in future releases, RDF will never switch to a new file when writing, ignoring TTree::fgMaxTreeSize.

You could write just one large file and read back parts of it by passing to RDF a TTree/TChain+TEntryList, or use Ranges (in single-thread runs) or TEntryLists to achieve a similar effect to fgMaxTreeSize.

Cheers,
Enrico

Thanks @eguiraud,

You could write just one large file and read back parts of it by passing to RDF a TTree/TChain+TEntryList, or use Ranges (in single-thread runs) or TEntryLists to achieve a similar effect to fgMaxTreeSize.

I’ll try this. I have concerns about the amount of disk space used but my files shoudln’t be very large (O(1TB)). Also will TEntryLists work with ImplicitMT?

Thanks

Yes, TEntryLists will work in multi-thread runs, Ranges will not.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.