Bug in TProofPlayer::MergeOutputFiles

Hi Gerri,

There is a minor bug in the MergeOutputFiles function in TProofPlayer. In particular, in the case, more common with submergers, that there is only one file going into a merger, the local merging flag is not respected, and this is propagated to the master for final merging. The fix at least is easy:

           // Point to the merger
           TFileMerger *filemerger = pf->GetFileMerger();

Needs to change to:

           // Point to the merger
           Bool_t localMerge = (pf->GetTypeOpt() == TProofOutputFile::kLocal) ? kTRUE : kFALSE;
           TFileMerger *filemerger = pf->GetFileMerger(localMerge);

It’s an important fix for multiple TProofOutputFiles (signal grids in my case)…local merges are around 3-6x faster than remote merges on our cluster. Thanks,

Bart

P.S. Two other thoughts:

  1. Would it be at all possible to have the option of forking the mergers, or having TProofServ processes do it on the top master? We have a situation that when running on signal grids (O(100) TProofOutputFiles) that only one CPU on the top master is utilized in merging, when it has many CPUs, and this increases total processing time immensely over utilizing all of them. I realize this is kind of a special case though.

  2. It takes nearly 18 seconds on our cluster to “merge” a single file (a list of files with one entry). It seems like the code could be smarter and detect this situation and use a raw copy instead of reading each object in a file and “merging” it with nothing. Here is my solution–I put this in TFileMerger::PartialMerge, right before the line: fOutputFile->SetBit(kMustCleanup);

    if((fFileList->GetEntries()==1) && !fExcessFiles->GetEntries() && !(in_type & kIncremental) && !fCompressionChange && !fExplicitCompLevel) {
    fOutputFile->Close();
    SafeDelete(fOutputFile);

    TIter iter(fFileList);
    TFile* file = (TFile*)iter();
    Bool_t result = file->Cp(fOutputFilename);

    if (file->TestBit(kCanDelete)) file->Close();
    // remove the temporary file
    if(fLocal) {
    TString p(file->GetPath());
    p = p(0, p.Index(’:’,0));
    gSystem->Unlink§;
    }
    fFileList->Clear();
    if (!result) {
    Error(“Merge”, “error during merge of your ROOT files”);
    }
    return result;
    }

Hi Bart,

Thanks. The fix is in the trunk, 5-32-patches and 5-30-patches. Do you need it in 5-28 too?

I am working right now to allow for PROOF-Lite workers to be ‘attached’ at various masters. Your case, however, is more in the direction of having a threaded merger. We have a development called ‘parallel merger’ which may fit here. But it needs some integration … I have to check the status and usability.

Of course we should be smarter here.
Can you send me a svn diff wrt the trunk so that I can submit it to the person taking care of TFileMerger?

Thanks, Gerri

[quote=“ganis”]Hi Bart,

Thanks. The fix is in the trunk, 5-32-patches and 5-30-patches. Do you need it in 5-28 too?
[/quote]

No, we are using 5.32 now, so I would not worry about 5.28. Thanks!

Soon after I wrote this I realized a nice workaround was to combine the direct copy in case of single file merge (the next point) with MergersByHost with making sure all my individual signal grid files were isolated on a signal host. That way all the mergers happen as submergers, and the master’s role in this case is simply copying the output file to its final location. Takes a bit of work to set up though.

[quote=“ganis”]

Of course we should be smarter here.
Can you send me a svn diff wrt the trunk so that I can submit it to the person taking care of TFileMerger?

Thanks, Gerri[/quote]

The TFileMerger.cxx svn diff is attached.
TFileMerger_cxx_svndiff.txt (999 Bytes)

Hi Gerri,

Looking at the trunk in 5.99 (this afternoon), I don’t think this idea of not merging single files has been implemented. The svn diff should be identical…I was able to copy and paste my code from 5.32. Please correct me if I am wrong and this was taken care of/officially rejected.

Thanks,

Bart

Hi Bart,

Sorry, that was my fault, I had the patch in my dirs and I thought I had submitted to Philippe Canal (I cannot commit this patch by myself); but I did not. I’ve just done it now.

Gerri

Thanks Gerri!

-Bart