Dataframe Snapshot - file larger than 100 GB

Hi there,

I am working in pyroot (ROOT version 6.22/00) and am attempting to write a number of rather large dataframes to an output file using Snapshot(). The command I’m using is something like

frames[systematic].Snapshot(systematic,outFileName,branch_list,opts)

where frames is the dictionary of dataframes, one for each systematic in the analysis and branch_list is the list of columns to carry over to the snapshot file. For some of the larger input files, I run into the problem of trying to merge files larger than 100 GB:

Fatal in TFileMerger::RecursiveRemove: Output file of the TFile Merger (targeting [FILE]) has been deleted (likely due to a TTree larger than 100Gb)
aborting

I have seen in other forums (e.g. Root 6.04.14 hadd 100Gb and rootlogon) that you can include a C++ header file and load some libraries to get around this, but I’m not sure how this works with pyroot (not to mention that I am not at all fluent in using shared libraries). Any suggestions?

Thank you!

Hello,

You can try to use the same LD_PRELOAD mechanism but with the Python executable:

LD_PRELOAD=startup_C.so python your_RDF_script.py

@pcanal is that mechanism to call TTree::SetMaxTreeSize still what is needed to circumvent the limitation?

@etejedor Yes unless pyroot source the rootlogon.C

pyroot does source the rootlogon.C if it exists and there is no rootlogon.py.

You mean then that TTree::SetMaxTreeSize can be set in a rootlogon.C (or even rootlogon.py).

You mean then that TTree::SetMaxTreeSize can be set in a rootlogon.C (or even rootlogon.py).

Yes :slight_smile: