Merging files in multiple loops with TFileMerger

Hello,

I need to merge multiple root files from grid each of which contains just few histograms (but histograms themselves are sufficiently large) . In order to save computer resources, I prepared a macro with a loop each time merging a limited number of remote files into a local one . Unfortunately the macro stops right after execution of the first merge.

Below I insert one of the versions of macro, where I put the list of files inside a vector:

ifstream infile(“fluka_files.txt”);
std::vector lines;
TString ll;
while (infile >> ll) lines.emplace_back(ll);

const Int_t maxNfilesMerged = 20;
Int_t k = 0;
auto FM = new TFileMerger();

for (Int_t i = 0; i < int(lines.size()); i++) {

if (FM->GetMergeList()->GetSize() == 0) {
   FM->OutputFile(Form("Merged_files/merged%d.root", k));
    FM->SetFastMethod(kFALSE);
}

TString line = lines[i];
TString filename = "alien://" + line;

FM->AddFile(filename);

if (FM->GetMergeList()->GetSize() < maxNfilesMerged && i < int(lines.size()) - 1) {
    continue;
} else {
    printf("merging to %s...\n", FM->GetOutputFileName());

    FM->Merge();    **// Execution of code stops here!!!
   cout << "some text..." << endl //not displayed!!!
    FM->Reset();
}

}

Execution of code stops after FM->Merge().

I am not sure why that would be the case (is it stopping or just taking a long time?).

But what your implementing is supported by hadd (and by TFileMerger) which the option -n number_files_to_group (and TFileMerger::SetMaxOpenedFiles)

The only difference is the number of output files (1 for hadd vs many in your case).

Thank you for your reply

Unfortunately hadd is not applicable, since there are many files weighty files (up to 1 thousand X ~40 MB) in my grid director which I need to merge locally.

I also discovered that the first merged file itself is corrupted:

Attaching file merged0.root as _file0…
Warning in TFile::Init: file merged0.root probably not closed, trying to recover
Info in TFile::Recover: merged0.root, recovered key TH3D:h31 at address 238
Info in TFile::Recover: merged0.root, recovered key TH3D:h32 at address 54485712
Warning in TFile::Init: successfully recovered 2 keys
(TFile *) 0x2a76680

In my case typical file size is approximately 40 MB, but I discovered that there are some with 10 MB. Have tried to set filters on minimum file size 20, 30, and 35 MB, but the error happens again. Is there some more reliable way to check quality of the files? They contain just few histograms, no trees, folders and other complex structures.

What do you mean?

I did some simulations in ALICE Grid which produced many ROOT files and I want to merge them.

Can you clarify “how/why” hadd is not applicable to your situation? (while somehow TFileMerger is applicable … note hadd calls TFileMerger).

The files to merge are placed at a remote location

… It is therefore useful in a Grid environment where the files might be accessible only remotely. The merging interface allows files containing histograms and trees to be merged, like the standalone hadd program.

Can we apply hadd without downloading files before merging?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.