[new script] Splitting a ROOT file tree into N files

Dear ROOT forum,

the possibility of splitting a ROOT file into many smaller files has been introduced in ROOT by means of the “rooteventselector” python script, However this script presents a bug: it does not work when multiple trees are present with the same name.
See:

I don’t know if the issue has been solved (with ROOT 6.16 it still does not work), but I created a small python script to get the work done. You can get it here:

The script works with python 2.7 + ROOT 6.x (it should also work with ROOT 5.x but it has not been tested) and it requires pyROOT + Numpy.
It is based on available tutorials and it could be much more pythonic… but I don’t have much time to improve it (i.e. suit yourself!)

Comments, questions, corrections?
Cheers,
Valentina

1 Like

Hi Valentina,

nice. We take your post as a reminder to fix for 6.18 rooteventselector.
Two curiosities:

  1. Why do you need to split your ROOT files? What are you trying to achieve?
  2. Why does your script depend on Numpy?

Cheers,
D

Hi!

    1. Why do you need to split your ROOT files? What are you trying to achieve?
      I use ROOT files as input for Geant4 simulations, and I needed to break the input file in smaller ones to reduce the CPU time by running many smaller jobs in a cluster.
      I am in early development phase, and everything is manual right now but I will likely integrate the script in a pipeline.
    1. Why does your script depend on Numpy?
      Oops, you are right! I use it for the ints and floats, but I could use python math in this code, since I don’t require any fancy numpy feature. I will remove it.

Thanks!
Valentina

Hi there,

I’m looking to split my root files in half, such that I have a training and test file of roughly equal size for a Multi Variate Analysis. I have tried using your script (thank you!) however the output files are at least 10 times the size of the originals.

Has functionality to split a file been added to ROOT since this discussion that I have not been able to find in my searches? If not, would you know what could be causing this increase in size of the files?

Hi @hmwakeling ,
rooteventselector, a helper tool that comes with ROOT, should do what you want. If not, RDataFrame can also be used to split files, e.g. in Python

df = ROOT.RDataFrame("treename", "filename.root")
df.Range(nEntries / 2).Snapshot("newtree", "newfile1.root")
df.Range(nEntries / 2, nEntries).Snapshot("newtree", "newfile2.root")

(In general please consider creating a new topic rather than replying to one that’s two years old, more chances to get an up-to-date reply).

Cheers,
Enrico

1 Like

there’s also root-split from groot:

full list here.

ex:

$> root-split -h
Usage: root-split [options] file.root

ex:
 $> root-split -o out.root -n 10 ./testdata/chain.flat.1.root

options:
  -n int
    	number of events to split into (default 100)
  -o string
    	path to output ROOT files (default "out.root")
  -t string
    	input tree name to split (default "tree")
  -v	enable verbose mode

hth,
-s

2 Likes

Thank you very much for your help!
In future I shall open a new thread :slight_smile: