[new script] Splitting a ROOT file tree into N files

Dear ROOT forum,

the possibility of splitting a ROOT file into many smaller files has been introduced in ROOT by means of the “rooteventselector” python script, However this script presents a bug: it does not work when multiple trees are present with the same name.

I don’t know if the issue has been solved (with ROOT 6.16 it still does not work), but I created a small python script to get the work done. You can get it here:

The script works with python 2.7 + ROOT 6.x (it should also work with ROOT 5.x but it has not been tested) and it requires pyROOT + Numpy.
It is based on available tutorials and it could be much more pythonic… but I don’t have much time to improve it (i.e. suit yourself!)

Comments, questions, corrections?

Hi Valentina,

nice. We take your post as a reminder to fix for 6.18 rooteventselector.
Two curiosities:

  1. Why do you need to split your ROOT files? What are you trying to achieve?
  2. Why does your script depend on Numpy?



    1. Why do you need to split your ROOT files? What are you trying to achieve?
      I use ROOT files as input for Geant4 simulations, and I needed to break the input file in smaller ones to reduce the CPU time by running many smaller jobs in a cluster.
      I am in early development phase, and everything is manual right now but I will likely integrate the script in a pipeline.
    1. Why does your script depend on Numpy?
      Oops, you are right! I use it for the ints and floats, but I could use python math in this code, since I don’t require any fancy numpy feature. I will remove it.


Hi there,

I’m looking to split my root files in half, such that I have a training and test file of roughly equal size for a Multi Variate Analysis. I have tried using your script (thank you!) however the output files are at least 10 times the size of the originals.

Has functionality to split a file been added to ROOT since this discussion that I have not been able to find in my searches? If not, would you know what could be causing this increase in size of the files?

Hi @hmwakeling ,
rooteventselector, a helper tool that comes with ROOT, should do what you want. If not, RDataFrame can also be used to split files, e.g. in Python

df = ROOT.RDataFrame("treename", "filename.root")
df.Range(nEntries / 2).Snapshot("newtree", "newfile1.root")
df.Range(nEntries / 2, nEntries).Snapshot("newtree", "newfile2.root")

(In general please consider creating a new topic rather than replying to one that’s two years old, more chances to get an up-to-date reply).


there’s also root-split from groot:

full list here.


$> root-split -h
Usage: root-split [options] file.root

 $> root-split -o out.root -n 10 ./testdata/chain.flat.1.root

  -n int
    	number of events to split into (default 100)
  -o string
    	path to output ROOT files (default "out.root")
  -t string
    	input tree name to split (default "tree")
  -v	enable verbose mode



