the possibility of splitting a ROOT file into many smaller files has been introduced in ROOT by means of the “rooteventselector” python script, However this script presents a bug: it does not work when multiple trees are present with the same name.
See:
I don’t know if the issue has been solved (with ROOT 6.16 it still does not work), but I created a small python script to get the work done. You can get it here:
The script works with python 2.7 + ROOT 6.x (it should also work with ROOT 5.x but it has not been tested) and it requires pyROOT + Numpy.
It is based on available tutorials and it could be much more pythonic… but I don’t have much time to improve it (i.e. suit yourself!)
Why do you need to split your ROOT files? What are you trying to achieve?
I use ROOT files as input for Geant4 simulations, and I needed to break the input file in smaller ones to reduce the CPU time by running many smaller jobs in a cluster.
I am in early development phase, and everything is manual right now but I will likely integrate the script in a pipeline.
Why does your script depend on Numpy?
Oops, you are right! I use it for the ints and floats, but I could use python math in this code, since I don’t require any fancy numpy feature. I will remove it.
I’m looking to split my root files in half, such that I have a training and test file of roughly equal size for a Multi Variate Analysis. I have tried using your script (thank you!) however the output files are at least 10 times the size of the originals.
Has functionality to split a file been added to ROOT since this discussion that I have not been able to find in my searches? If not, would you know what could be causing this increase in size of the files?
Hi @hmwakeling , rooteventselector, a helper tool that comes with ROOT, should do what you want. If not, RDataFrame can also be used to split files, e.g. in Python