I have a large collection of ROOT
files, containing identically formatted TTrees
. I wish to merge these into a single file, containing only three branches (call them x
,y
,z
) from the original trees, in addition to a new branch created by performing some mathematical operations on branches of the original trees (some of which will not be included in the merged file).
I found that the “cleanest” and, by far, the fastest way was to use the .Snapshot()
and .Define()
methods of RDataFrame
:
void merge(const char* in_path, const char* out_path){
ifstream stream(in_path);
vector<string> list;
string line;
while(getline(stream, line))
list.emplace_back(line);
ROOT::RDataFrame df("tree",list);
df.Define("newBranch", [&](UInt_t a, UInt_t b, int x){
return myMathematicalFunction(a,b,c);
},{"a","b","x"}).Snapshot("tree",out_path,{"newBranch","x","y","z"});
}
…where in_path
gives the path to a text file containing the list of ROOT
file paths.
The branch y
contains sequential data of type UInt_t
(e.g. 1, 7, 13, 24
). In the merged file, the first couple thousand or so entries match what is in the original files. Unfortunately, however, at some seemingly arbitrary point, the remaining entries jump to some constant large value.
This doesn’t seem to be some overflow or similar issue with the size of the original values. For example, in one case, the data in y
were simply the integers, starting at 1
(i.e. 1,2,3,4,5...
), and after 2707
became 4590021
for the remaining entries.
I’ve verified that the issue does not exist in the original files.
What’s going on here?