Directory automatically created in TFile when writing histograms

Dear experts,

In my code I am currently writing a lot of histograms to a ROOT file, which is then later accessed by another process for plotting. I noticed however that when writing a lot of histograms (12992 in my case) to a single file (without directories), at some point two directories are automatically created and the histograms are split between the two directories. Is there a way to prevent these directories from being created when writing large amounts of histograms to a file?

Thanks for the help

Dear willemv,

Can you give some more details about what you are exactly doing? I.e. about the structure / flow of your program.
What are names of the two directories?

G Ganis

Hi Ganis,

The structure of my program is as follows:

I try to make a number of plots for a large number of kinematic variables and categories. To efficiently process a lot of data I submit several parallel jobs, which each process a part of the dataset and then write histograms for every distribution and every category to a root file. Subsequent jobs then read in and combine these histograms into plots. The loop where I write the histograms can be summarized as follows (with some dummy variables I introduce here for simplicity):

TFile* outputFile = TFile::Open("file.root", "RECREATE");
for(size_t dist = 0; dist < distributions.size(); ++dist){
    for(size_t category = 0; category <categories.size(); ++category){
        histograms[dist][category]->Write(categoryNames[category] + distributionNames[dist] );
    }
}
outputFile->Close();

When I write up to a few thousand histograms to a single file this code behaves as expected: I have a root file without directories, in which all the histograms are separately listed/stored when I open the file. However once I increase the amount of histograms to more than about 10000, two different directories appear, which I never specified in my code, and the same name as 2 particular histograms. All histograms are stored, split between these directories, and are completely fine however.

I hope this description is clear and helps.

regards,

Willem

Dear willemv,

That looks quite strange to me (I cannot reproduce it in a simple macro); I have invited a colleague to comment.
The final name is the sum of categoryNames and distributionNames: are there special characters in these strings?

G Ganis

Hi Ganis,

The names only contain numbers, letters and underscores (_).

I found it rather strange too, and obviously first suspected my code of being buggy. However after rather extensive debugging and checking the names of the TObjetcs I have at every step I could not find any issue. The same code, with a cutoff (a simple "continue statement) after writing a few thousand histograms produced an output file with no directories. But after writing all histograms they are split between two directories.

Another check I did was to divide the histograms along directories myself, this time writing all the histograms ( which previously caused the 2 directories to appear ). In this case the amount of histograms in every directory was a few hundred, and there were no issues, no unwanted directories, and all histograms got written successfully. This test was done with exactly the same code with a few added lines to create the directories.

I don’t know if it matters, but the histograms I used had Sumw2() set, so I guess they use more memory than histograms without this flag.

regards,

Willem

Dear willemv,

Thanks for the additional details.
An additional thing: what types are categoryNames and distributionNames? TString or other?

G Ganis

Try (with ROOT 6):

rootls -l file.root

Hi Ganis,

They are std::strings. So the moment I do the write statement I cast them to TString references. So the code example I gave would more accurately describe my real code with:

histograms[dist][category]->Write( (const TString&) categoryNames[category] + distributionNames[dist] );

regards,

Willem

histograms[dist][category]->Write( (categoryNames[category] + distributionNames[dist]).c_str() );

Yes, what you propose should work too, but as far as I understand the method I use should yield the same result.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.