In my code I am currently writing a lot of histograms to a ROOT file, which is then later accessed by another process for plotting. I noticed however that when writing a lot of histograms (12992 in my case) to a single file (without directories), at some point two directories are automatically created and the histograms are split between the two directories. Is there a way to prevent these directories from being created when writing large amounts of histograms to a file?
Can you give some more details about what you are exactly doing? I.e. about the structure / flow of your program.
What are names of the two directories?
I try to make a number of plots for a large number of kinematic variables and categories. To efficiently process a lot of data I submit several parallel jobs, which each process a part of the dataset and then write histograms for every distribution and every category to a root file. Subsequent jobs then read in and combine these histograms into plots. The loop where I write the histograms can be summarized as follows (with some dummy variables I introduce here for simplicity):
When I write up to a few thousand histograms to a single file this code behaves as expected: I have a root file without directories, in which all the histograms are separately listed/stored when I open the file. However once I increase the amount of histograms to more than about 10000, two different directories appear, which I never specified in my code, and the same name as 2 particular histograms. All histograms are stored, split between these directories, and are completely fine however.
That looks quite strange to me (I cannot reproduce it in a simple macro); I have invited a colleague to comment.
The final name is the sum of categoryNames and distributionNames: are there special characters in these strings?
The names only contain numbers, letters and underscores (_).
I found it rather strange too, and obviously first suspected my code of being buggy. However after rather extensive debugging and checking the names of the TObjetcs I have at every step I could not find any issue. The same code, with a cutoff (a simple "continue statement) after writing a few thousand histograms produced an output file with no directories. But after writing all histograms they are split between two directories.
Another check I did was to divide the histograms along directories myself, this time writing all the histograms ( which previously caused the 2 directories to appear ). In this case the amount of histograms in every directory was a few hundred, and there were no issues, no unwanted directories, and all histograms got written successfully. This test was done with exactly the same code with a few added lines to create the directories.
I don’t know if it matters, but the histograms I used had Sumw2() set, so I guess they use more memory than histograms without this flag.
They are std::strings. So the moment I do the write statement I cast them to TString references. So the code example I gave would more accurately describe my real code with: