Appending to file opened in UPDATE mode corrupts file

Short version: A file with ntuples was written and closed. In the same program, I open the file again in UPDATE mode, then close it. After that, the first ntuple in the file is corrupted (though the remaining ntuples and objects are fine).

Details:

I’m running a Geant4 simulation (4.11, though I don’t think that matters here). The program uses Geant4’s G4AnalysisManager to manage and write n-tuples. At the end of main program, after the G4RunManager is closed, I want to append the detector geometry as a TGeoManager in the ROOT output file.

(The idea is that even if a user fails to keep a good record of which detector version they used to run the simulation, the TGeoManager description will be in the file as a reference. Our detector model takes up ~70K of space, which is nothing compared to the ~50M of our typical simulation output file.)

I have verified that, before the following code is executed, the value of gFile is 0 (i.e., no ROOT file is open). The value of gDirectory->GetName() is /root. This is the current version of the code that corrupts the output file:

auto geoManager = gGeoManager->Import("parsed.gdml");
std::shared_ptr<TFile> outputFile ( TFile::Open(g4job.root,"UPDATE") );
geoManager->Write("DetectorGeometry");
outputFile->Close();

parsed.gdml is the result of G4GDMLparser::Write earlier in the program. I create this file separately because this output is ROOT-compatible (whereas the GDML input to the G4 program is not, since it contains formulas and loops). I’ve verified via independent programs that there’s nothing problematic about the contents of parsed.gdml.

The TGeoManager structure DetectorGeometry is being appended to the file g4job.root. I can examine the file with TBrowser, draw the detector, etc.

The problem is when I try to access the first ntuple/TTree in the file:

root g4job.root
root [0] 
Attaching file g4job.root as _file0...
(TFile *) 0x56337037d900
root [1] FirstNtuple->Scan()
************************************************************************************************************
*    Row   *   Run.Run * Event.Eve * TrackID.T * PDGCode.P * numPhoton * energy.en * tStart.tS * xStart.xS *
************************************************************************************************************
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:0, badread=0, nerrors=1, basketnumber=0
*        0 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 6.6710802 * 0.1178747 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:1, badread=0, nerrors=2, basketnumber=0
*        1 *         0 *         0 *         1 *        22 *       160 * 0.0031776 * 7.0527228 * -5.057142 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:2, badread=0, nerrors=3, basketnumber=0
*        2 *         0 *         0 *         1 *        22 *        14 * 0.0002494 * 7.5646246 * -1.809136 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:3, badread=0, nerrors=4, basketnumber=0

… and so on. Only the first ntuple has this problem.

I’ve tried doing the TGeoManager::Import both before and after opening the output file in UPDATE mode. The results are the same.

I know from other forum posts that this means that the g4job.root has become corrupted. If I comment out the TGeoManager::Write line, the g4job.root file still shows the same problem. This suggests that simply opening and closing g4job.root in UPDATE mode is what’s corrupting the file.

If I comment out that entire block of code that contains the TFile::Open, everything is fine:

root [0] 
Attaching file g4job.root as _file0...
(TFile *) 0x5632a3e7d850
root [1] FirstNtuple->Scan()
************************************************************************************************************
*    Row   *   Run.Run * Event.Eve * TrackID.T * PDGCode.P * numPhoton * energy.en * tStart.tS * xStart.xS *
************************************************************************************************************
*        0 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 6.6710802 * 0.1178747 *
*        1 *         0 *         0 *         1 *        22 *       160 * 0.0031776 * 7.0527228 * -5.057142 *
*        2 *         0 *         0 *         1 *        22 *        14 * 0.0002494 * 7.5646246 * -1.809136 *
*        3 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 7.8422709 * -5.057142 *
*        4 *         0 *         0 *         1 *        22 *        13 * 0.0002494 * 8.1213327 * -10.39429 *
*        5 *         0 *         0 *         1 *        22 *         5 * 0.0002494 * 8.1247158 * -10.43015 *
*        6 *         0 *         0 *         1 *        22 *       161 * 0.0031776 * 8.1846038 * -11.03238 *
*        7 *         0 *         0 *         9 *        11 *       220 * 0.0042875 * 8.3723247 * -6.236709 *
*        8 *         0 *         0 *         9 *        11 *       673 * 0.0129153 * 8.3723765 * -6.235956 *
*        9 *         0 *         0 *         9 *        11 *       911 * 0.0183359 * 8.3725065 * -6.234133 *
*       10 *         0 *         0 *         9 *        11 *      1485 * 0.0293105 * 8.3726141 * -6.234505 *

It was my understanding that opening a file in UPDATE mode was not supposed to interfere with the existing contents of a file. Did I mis-understand? Is there something special about files containing ntuples that I should compensate for? Or am I missing something else?


ROOT Version: 6.28/04
Platform: CentOS 7
Compiler: 12.3.0


1 Like

Does it work If you invert the order of these two lines?

auto geoManager = gGeoManager->Import("parsed.gdml");
std::shared_ptr<TFile> outputFile ( TFile::Open(g4job.root,"UPDATE") );

i.e., to

std::shared_ptr<TFile> outputFile ( TFile::Open(g4job.root,"UPDATE") );
auto geoManager = gGeoManager->Import("parsed.gdml");

and add

outputFile->cd();

just before writing to it.

1 Like

It didn’t before, but just in case I tried your suggestion. It still did not work, in that the first ntuple was still corrupted.

If I entirely comment out the creation and writing of the TGeoManager, but merely open the file in UPDATE mode and close it, the first ntuple is still corrupted.

1 Like

Edit: Darn it, I have to bounce this back to this forum again. Please skim this reply, then move to the next one.

I’ll leave this question up here, but I’m going to “unask” the question. The problem is not with ROOT in general, but with Geant4’s G4AnalysisManager.

I tested updates in general with some code like this:

root
ROOT::RDataFrame rdf(100);
auto rdf_x = rdf.Define("x", [](){ return gRandom->Rndm(); });
rdf_x.Snapshot("myNewTree","update-test.root");
.q
root
auto input = TFile::Open("update-test.root","UPDATE");
input->Close();
,q
TFile *_file0 = TFile::Open("update-test.root")
myNewTree->Scan()
.q

Everything worked fine. Then I looked to see how Geant4’s G4AnalysisManager works using an LXR Browser. It led me down a ridiculous rabbit hole of nested classes that call each other. Finally I wound up in this obscure routine. It appears that instead of calling ROOT routines like TFile to handle their I/O, G4 codes the various header bytes directly.

I’m not sure, but I think this is the cause of my problem: (Edit: It wasn’t.) ROOT doesn’t see that the file is open, because G4 side-skips any of ROOT’s I/O methods in favor of its own. When I call TFile::Open("g4job.root","UPDATE"), I’m calling it on a still-open file. Of course the header bytes get screwed up.

Therefore, the proper audience for this question is a Geant4 help forum or bug report, not the ROOT forum. Sorry!

I thought I could blame everything on G4AnalysisManager, but it’s not that simple.

In Geant4, I commented out all the lines that relate to opening g4job.root in update mode. I re-run Geant4, and the ntuples are fine:

root g4job.root
root [0] 
Attaching file g4job.root as _file0...
(TFile *) 0x56414a697f70
root [1] myFirstNtuple->Scan()
************************************************************************************************************
*    Row   *   Run.Run * Event.Eve * TrackID.T * PDGCode.P * numPhoton * energy.en * tStart.tS * xStart.xS *
************************************************************************************************************
*        0 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 6.6710802 * 0.1178747 *
*        1 *         0 *         0 *         1 *        22 *       160 * 0.0031776 * 7.0527228 * -5.057142 *
*        2 *         0 *         0 *         1 *        22 *        14 * 0.0002494 * 7.5646246 * -1.809136 *
*        3 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 7.8422709 * -5.057142 *
*        4 *         0 *         0 *         1 *        22 *        13 * 0.0002494 * 8.1213327 * -10.39429 *
*        5 *         0 *         0 *         1 *        22 *         5 * 0.0002494 * 8.1247158 * -10.43015 *

Then I use ROOT to simply open this file in UPDATE mode, and nothing else:

root
root [0] auto input = TFile::Open("g4job.root","UPDATE")
(TFile *) 0x55fc3d877f50
root [1] input->Close()
root [2] .q

Just this, without getting G4 involved, causes the file to become corrupt:

root g4job.root
root [0] 
Attaching file g4job.root as _file0...
(TFile *) 0x56164c8f6b40
root [1] myFirstNtuple->Scan()
************************************************************************************************************
*    Row   *   Run.Run * Event.Eve * TrackID.T * PDGCode.P * numPhoton * energy.en * tStart.tS * xStart.xS *
************************************************************************************************************
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:0, badread=0, nerrors=1, basketnumber=0
*        0 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 6.6710802 * 0.1178747 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:1, badread=0, nerrors=2, basketnumber=0
*        1 *         0 *         0 *         1 *        22 *       160 * 0.0031776 * 7.0527228 * -5.057142 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:2, badread=0, nerrors=3, basketnumber=0
*        2 *         0 *         0 *         1 *        22 *        14 * 0.0002494 * 7.5646246 * -1.809136 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:3, badread=0, nerrors=4, basketnumber=0
*        3 *         0 *         0 *         1 *        22 *         2 * 1.462e-05 * 7.8422709 * -5.057142 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:4, badread=0, nerrors=5, basketnumber=0
*        4 *         0 *         0 *         1 *        22 *        13 * 0.0002494 * 8.1213327 * -10.39429 *
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=1801810947 but fEntryOffset=0, pos=166, len=5035, fNbytes=1895623809, fObjlen=0, trying to repair
Error in <TBranch::GetBasket>: File: gramsg4.root at byte:0, branch:Run, entry:5, badread=0, nerrors=6, basketnumber=0
*        5 *         0 *         0 *         1 *        22 *         5 * 0.0002494 * 8.1247158 * -10.43015 *

How can simply opening a file in UPDATE mode affect a file in this way? Is there anything else I can check?

Hi,
could you please share your original g4job.root (before you opened it in the "UPDATE" mode) so that we could take a look and possibly reproduce the issue?

It’s a 5MB file. What is the standard way of sharing files like this on this forum? I could upload the file (I see the icon as I compose this reply) but I don’t want to fill up whatever space you’ve got.

People usually use free services like Dropbox, CERNBox, Mega, WeTransfer, etc.

Of course; I’m just being stupid this morning. Here’s a link:

Yes, I can confirm that the LArHits TTree gets messed up somehow.

And everything is fine when I do the same procedure with my own files with TTrees, so I guess it is something specific to this particular TTree.

This may require someone who understands the structure of a TFile to comment. What I can offer is this:

  • The link to the “good” version of file, as written by Geant4’s analysis manager; this is the same link as above:
  • The link to the “broken” version of file, the one created by auto input=TFile::Open("gramsg4-working.root","UPDATE");input->Close(); within ROOT.
  • The output of cmp -l gramsg4-works.root gramsg4-broken.root; cmp is the UNIX utility for comparing binary files. The bytes themselves are displayed in octal.

The binary comparison is shorter than I thought it was going to be. There are only a few differences at the beginning and at the end of the respective files.

The question is: Why should UPDATE have changed the contents of the file at all, much less put the file in an unreadable state? I’ve looked at the description of a ROOT file, but I don’t see a “smoking gun” that explains what’s going on, other than perhaps something is being potentially added at the end of the file.

At this point, my links are probably confusing matters, but I can’t help but try:

This link is the output of the command diff <(xxd gramsg4-works.root) <(xxd gramsg4-broken.root). In other words, it’s comparison of the differences between the two ROOT files in hex. Again, there aren’t many differences; just enough to corrupt the file, I guess.

In a naive attempt to understand the differences, I created a spreadsheet with differences in the headers of the two files. I’ve highlighted the differences in the spreadsheet:

It’s clear that UPDATE is adding a UUID where the original file did not have one. I understand the logic behind that. But I don’t understand the changes in the “free data records” fields; it looks like free space is being reduced somehow.

Any thoughts?

Seeing this:

auto geoManager = gGeoManager->Import("parsed.gdml");
std::shared_ptr<TFile> outputFile ( TFile::Open(g4job.root,"UPDATE") );
geoManager->Write("DetectorGeometry");
outputFile->Close();

which should be:

auto geoManager = gGeoManager->Import("parsed.gdml");
std::shared_ptr<TFile> outputFile ( TFile::Open(g4job.root,"UPDATE") );
geoManager->Write("DetectorGeometry");
outputFile->Write();
outputFile->Close();

I tried that. It made no difference.

Once again, I can cause the problem by entitrely commenting out the TGeoManager lines in Geant4, then simply opening and closing the file post-G4:

root
auto file = TFile("gramsg4.root","UPDATE");
file->Close();
.q

This is sufficient to cause the problem. TGeoManager has nothing to do with it.

Then this must be an issue in the Geant4 re-implementation of ROOT I/O. This sounds like the same as here: Opening a file in update mode causes Error in <TBasket::Streamer>: The value of fNbytes is incorrect - #15 by pcanal

Yep, that was it! Thanks!

I’ll file a bug report with the Geant4 team. We’ll see if we can spare future users the same frustrating experience I had.

Now I have to figure out how to run hadd from within the simulation. (If I create a separate post-processing step, the users will forget to run it.)

For what it’s worth:

#include "TSystem.h"

// ... run Geant4 simulation including G4AnalysisManager

  G4String filename = <the output file from G4AnalysisManager after it's been closed>

  // Search for hadd in the user's environment.
  auto path = gSystem->Getenv("PATH");
  auto hadd = gSystem->Which(path,"hadd");

  if ( hadd == nullptr ) {
    // We could not find hadd.                                                                                                    

    if (debug || verbose)
      G4cout << "gramsg4.cc - Could not find hadd, filename '"
             << filename << "' unchanged; cannot be opened in UPDATE mode"
             << G4endl;
  }
  else {
    // Define the work file name.                                                                                                 
    G4String workfile = "work_" + filename;

    // Rename the output file to the temporary work file name.                                                                    
    gSystem->Rename(filename,workfile);

    // Setting the hadd verbosity argument.                                                                                       
    G4String vhadd = "-v 0";
    if (debug || verbose)
      vhadd = "-v 99";

    // hadd -v 0 -f gramsg4.root work_gramsg4.root                                                                                
    G4String haddCommand = G4String(hadd) + " " + vhadd
      + " -f " + filename + " " + workfile;

    if (debug || verbose)
      G4cout << "gramsg4.cc - Executing command '"
             << haddCommand << "'"
             << G4endl;

    // Execute the hadd command.                                                                                                  
    gSystem->Exec(haddCommand);

    // Remove the work file; it just wastes disk space at this point.                                                             

    if (debug || verbose)
      G4cout << "gramsg4.cc - Deleting '" << workfile << "'"
             << G4endl;

    gSystem->Unlink(workfile);
  }

After these “file repair” lines, I can open filename in UPDATE mode and do whatever I like with it.

The code executed by hadd is also available through the class TFileMerger

I hunted for TFileMerger documentation and examples. I found this and I’ll give it a try.