Hi,
so I took Python, just-in-time compilation and compilation optimization levels out of the equation to compare the performance of Snapshot and TFileMerger on equal grounds:
// tfilemerger.cpp
#include <TStopwatch.h>
#include <TFileMerger.h>
int main() {
TFileMerger mrg;
mrg.SetFastMethod(true);
mrg.AddFile("file_0.root");
mrg.AddFile("file_1.root");
mrg.OutputFile("file_mrg_mr.root");
TStopwatch sw;
sw.Start();
mrg.Merge();
sw.Stop();
sw.Print();
}
// snapshot.cpp
#include <ROOT/RDataFrame.hxx>
void merger_df() {
auto df = ROOT::RDataFrame("tree", {"file_0.root", "file_1.root"});
df.Snapshot<double, double, double, double, double, double, double, double,
double, double, double, double, double, double, double, double,
double, double, double, double, double, double, double, double,
double, double, double, double, double, double>(
"tree", "file_mrg_df.root",
{
"a_0", "a_1", "a_2", "a_3", "a_4", "a_5", "a_6", "a_7",
"a_8", "a_9", "a_10", "a_11", "a_12", "a_13", "a_14", "a_15",
"a_16", "a_17", "a_18", "a_19", "a_20", "a_21", "a_22", "a_23",
"a_24", "a_25", "a_26", "a_27", "a_28", "a_29",
});
}
int main() {
TStopwatch st;
st.Start();
merger_df();
st.Stop();
st.Print();
}
I am aware that nobody will ever write a Snapshot invocation like that, but it’s useful for the purposes of making sure that both TFileMerger and Snapshot are compiled ahead of time and with a reasonable optimization level (-O2).
This results in a ~19s runtime for Snapshot and ~0.5s for TFileMerger.
Setting mrg.SetFastMethod(false); brings TFileMerger to a runtime of 17s. Flamegraphs easily show what the difference is (you can open them in their own browser tab to make them interactive – right-click, open in new tab):
As we suspected the difference is simply that Snapshot decompresses and re-compresses all data while TFileMerger does a direct copy of the compressed buffer (an optimization disabled by mrg.SetFastMethod(false)).
As Snapshot is more general it would be difficult to perform the same optimization as TFileMerger there (although not impossible, I guess).
I hope this clarifies what you see.
Cheers,
Enrico