Merging ~1M histograms

Dear ROOT experts,

I’m currently working on a project in which part of the workflow involves merging O(100) files with histograms (TH1F). Each histogram has roughly (it varies a little) 1300 bins. When fully merged, the final file has 975000 histograms.

This whole process can take up to 26 hours (AMD EPYC 7452, 128 cores, 256 GB of memory). The merging is currently done with “hadd”, even though TFileMerger gives the same performance (as expected). Parallelization (-j) does not bring improvements, since I suspect that there is very small overlap of histograms in the original files. Most of the time (but not always) the machine is just copying histograms from source to destination.

Is there any recommendation in cases like this on how to speed up this merging process? Maybe taking profit that there is a very small adding and mostly copying… Even a lower-level solution, with something not already implemented, I’m willing to pursue, if it makes sense, of course.

Thanks in advance.

Best,
Felipe


ROOT Version: 6.28/04
Platform: CC7_x86_64
Compiler: GCC 12


I suspect it might help to merge the file in smaller batches in parallel (and then merge the results).

Do you mean to merge in 10 batches 10 files instead of merging all 100 files at once. This then followed by merging the 10 files into 1 file. But that would double the amount of copying isn’t it ?

Yes, but the first 10 batches can be done in parallel.

Hi all,

Thanks for your feedback!

I tried to slice the merging in a batches, but it did not performed well.

In the end I wrote the program bellow to do the merging. I had to accommodate a high memory consumption, but the processing time decreased from 26 hours to 20 min.

#include <csignal>
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <memory>
#include <string>
#include <unordered_map>

#include "fmt/format.h"

#include "TFile.h"
#include "TH1F.h"
#include "TKey.h"

using namespace ROOT;

auto merger(const std::vector<std::string> &input_files, const std::string &output_file) -> void
{
    auto histos = std::unordered_map<std::string, std::unique_ptr<TH1F>>();
    for (auto &&file_path : input_files)
    {
        std::unique_ptr<TFile> root_file(TFile::Open(file_path.c_str()));

        TIter keyList(root_file->GetListOfKeys());
        TKey *key;
        while ((key = (TKey *)keyList()))
        {
            auto full_name = std::string(key->GetName());

            if (full_name.find("[EC_") == 0)
            {
                if (histos.find(full_name) == histos.end())
                {
                    histos.insert({full_name, std::unique_ptr<TH1F>(static_cast<TH1F *>(key->ReadObj()))});
                }
                else
                {
                    histos[full_name]->Add(static_cast<TH1F *>(key->ReadObj()));
                }
            }
        }
    }

    std::unique_ptr<TFile> output_root_file(TFile::Open(output_file.c_str(), "RECREATE", "", 0, 0));

    for (auto &&[name, histo] : histos)
    {
        output_root_file->WriteObject(histo.get(), name.c_str());
    }
}

auto main(int argc, char *argv[]) -> int
{
    TH1::AddDirectory(false);
    TDirectory::AddDirectory(false);

    if (argc < 3)
    {
        fmt::print(stderr, "ERROR: Could not merge files.\nUsage: {} <output> <input1> <input2> ...\n", argv[0]);
        std::exit(EXIT_FAILURE);
    }

    std::string output_file = argv[1];

    std::vector<std::string> inputs_files = {};
    for (int i = 2; i < argc; i++)
    {
        inputs_files.push_back(argv[i]);
    }

    merger(inputs_files, output_file);
    fmt::print("Done: {}\n", output_file);

    return EXIT_SUCCESS;
}

Best,
Felipe

Hi Felipe,

I have never used the program hadd but would hope that
it would do exactly what you your script does. You have a
dramatic improvement of a factor 50 in time, any thoughts on that.

-Eddy

@moneta Should take another look.

The main difference between your code and hadd/TFileMerger(..., kTRUE) is that you can assume that all the histograms have the same binning while hadd can’t. Namely you use the fast TH1::Add while hadd needs to call TH1::Merge. This is likely the difference.

Interesting. So maybe hadd should first check the binning before deciding about the algorithm.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.