we are building our own little analysis framework using root and are wondering why creating, filling and writing several thousand histograms seems to be very fast. But having them then and closing the file takes ages, much longer than the whole analysis before. Is this normal or could it be we are doing something absolutely wrong?
You can find the complete code in (the most recent “mehlhase”-tag would be the one in question, although the behaviour has alsways been like that):
especially important are perhaps the methods “ToTAsymmetryFinalize” in CTB_ToTAsymmetryAnalysis.cxx and “CloseInputOutput” in CTB_InputOutput.cxx where the files are closed. In general the procedure to create the files until closing them is:
- opening/creating the files
- creating the histograms
- doing a huge loop over all events and filling the histograms
- closing them…which is the problem then
In case it is a normal thing: what causes the closing of the files to be so slow. The files are smaller that 10MB although they contain several thousands of histograms. Is this a matter of the number of objects in the file? How can we improve the performance than?
Thanks a lot, I hope the information is sufficient! Otherwise feel free to ask!
I am sorry you should have a look at this tag instead of the above mentioned because CTB_ToTAsymmetryAnalysis.cxx is not yet in in the above…
Does nobody experience the same? Strange…
Could you post one of your histogram files in a public area?
thanks for your reply! You can find two of the root files that we get as output in /afs/cern.ch/user/d/danri/public.
If that is inconvenient for you I can also put it somewhere on the web. In these versions most of the histograms are not filled, however closing the files is slow anyhow. I also checked if there are perhaps many cycles of the same histograms that are constantly overwritten, but this seems not to be the case.
I looked at your file and did not see anything special except the big number of histograms: 328 directories, 864 TH2 and 6048 TProfile.
My guess is that you should be able to write these objects and close the file in about 1 or 2 seconds. What do you mean by slow? minutes?
Do you count in the closing time the time to write the histos or simply the time to file.Close() ?
Yes there are a lot of histograms and directories. To be able to be more precise I am doing some timing studies right now with a TStopwatch. 1-2 seconds would be great but for a simple test it took more than about one minute (real time) including writing. However writing itself seems to be very fast, it just takes a few seconds at most, I get a printout after Write() and one after Close() for the File ToTAsymmetryAnalysis.root and Write() is done in about 1-2 seconds whereas Close() is much slower. But I will tell you exactly when the big test ran through.
I did some testing now and it seems as if the problem is different for the two files I put in my public. Below is the commented output of the program, showing the timing information.
here is the analysis…
+++ TIMER: Real time 0:00:19, CP time 16.980
INFO ToTAsymmetryFinalize: Saving histograms…
+++ Histogram Sorting & Writing: Real time 0:00:00, CP time 0.890
INFO ToTAsymmetryFinalize: Closing Output file…
+++ Deleting Histogram Array: Real time 0:01:22, CP time 82.040 [color=orange]==> so here is the problem, I call Delete() of a TObjArray here[/color]
+++ Closing: Real time 0:00:07, CP time 3.980 [color=orange]==> but this is ok[/color]
+++ Deleting file: Real time 0:00:00, CP time 0.010
+++ TIMER: Real time 0:01:30, CP time 86.920
INFO: MakeStrawQAList started…
creating StrawQA file: StrawQA-0.txt … done writing 0 lines!
creating StrawQA file: StrawQA-1.txt … done writing 0 lines!
INFO: MakeStrawQAList finished!
INFO: Closing input and output files…
+++ Writing: Real time 0:00:02, CP time 1.680
+++ Closing Output: Real time 0:00:43, CP time 43.820 [color=orange]==> here closing seems to be the problem, TFile::Close() is called[/color]
INFO: Closing input and output files done!
+++ Closing Input: Real time 0:00:00, CP time 0.000
INFO: Program finished!
+++ TIMER: Real time 0:00:46, CP time 45.500
The treatment is different for the two files.
For the first one each object is written on its own after searching the right directory. The objects are in a TObjArray which is deleted with Delete() after writing the objects to file. This takes 1:22 which seems quite long to me (?)…am I doing something wrong there? The code is in the above mentioned CVS in ToTAsymmetryAnalysis.cxx, ToTAsymmetryFinalize().
For the second file all objects are written at once by TFile::Write(), in this case closing takes more than 40 seconds…Does TFile::Close() also call some deleting that might cause it to be so long?
Thanks in advance,
Looking at your code in CTB_AnalysisV3::ToTAsymmetryFinalize, I see that are you are doing the bookkeeping of the histograms in your own array (in addition of the normal bookkeeping by ROOT).
You can remove the normal ROOT bookkeeping by calling
before creating your histograms.
Let me knowwhat is the result when deleting your histograms.
I tried to set TH1::AddDirectory(kFALSE) now and it seems that closing the first file (ToTAsymmetryAnalysis.root) is much faster (6 seconds). Unfortunately it works only when not filling the histograms because I use the TDirectory::Get method in CTB_AnalysisV3::DoAsymmetryAnalysis to find the right histograms to be filled in the loop, which seems to depend on the ROOT bookkeeping, so that I get a segmentation fault without it.
The second file relies on the ROOT bookkeeping to a larger extend, as far as I understood it, so that I cannot test TH1::AddDirectory(kFALSE) there.
Before going further, could you give a try with the CVS version?
We have an improved hashlist destructor there.
Otherwise could you remove the destruction of your objarray?
testing the CVS version would take me some time, as I didn’t use it before. For testing purpose I tried to comment out the deleting of the TObjArray. In this case closing the file took much longer (increased about the same amount as the deleting of the TObjArray), so I think the deleting of the TObjArray is then done by TFile::Close?
another idea: Could you call
before closing the file ?
I tried to do it with and without deleting the histogram array before and what happens is:
the file closes faster, but somehow I lose all the directories and histograms then. In the file are only the top level directories Data and MC. Everything else is gone :-/
To understand what happens, we will need the shortest possible running example reproducing the problem
Can you make sure all your histograms are given the DIFFERENT names?
If you assign one and the same name to many of your histograms then what you see is normal.