For our experiment’s DataQuality Monitoring, we monitor raw data files in numerous small jobs creating a large histogram root file in the process. We then merge all of these root files from the individual jobs into a single file. After this, we may make efficiencies, fits, etc. on the final merged root file. In order to do this we open the root file w/ mode ‘UPDATE’ and either over-write keys via ‘obj->Write("", TObject::kOverwrite);’ or just make completely new histograms and save them to the root file. Each detector sub-system, trigger, physics object may have their own ‘post-processing’ alg that will open the final root file in ‘UPDATE’ mode. So in the end this final root file is opened in UPDATE mode and edited maybe 20 times w/ perhaps 1000’s of TObject edits.
There is a low rate at which the file becomes corrupted(don’t know this rate). It is extremely difficult to tell which algs are the culprits and the affected histograms on the final file(R__unzip errors) may belong to a non-offending algorithm. We can sometimes find solutions, but the reasons for the solutions are never satisfying.
So I wanted to see if I could make a short script illustrating the problem on a generic root file with many histograms. Thus I have attached a rather perverse script that can recursively loop through a root file histogram and for any object that is castable to at TH1, I will trivially modify the histogram and then save it back to the file in its original directory via ‘TDirectory::cd(); obj->Write("",TObject::kOverwrite(also kWriteDelete))’ and then see if I can get any of these R__unzip errors.
Maybe this script is not the best example. But it seems to be able to show that opening files in UPDATE mode and modifying objects, if done in excess can cause problems. So here is how to run the script:
cp $ROOTSYS/tutorials/io/dirs.C .
root -q -b dirs.C
root -q -b test2.C+
./scanFile.py test.root >/dev/null #my mac makes the file Test.root but linux does what I want
Depending on the value of m_copy_max, a variable in the script that controls how many TH1s to overwrite, one can see these R__unzip errors come from TKey::ReadObj(). On lxplus slc5 root5.30
sourced directly from here:
the above commands work. On my Mac test.root --> Test.root, and ./scanFile.py --> python scanFile.py
Anyways hopefully you see a problem w/ my script, can give a workaround–such as copying the file after each subsquent copy to remove any dead space in the file from deleting object, or find something w/i root?