Handling Large TH3Fs

jmatta1 · August 1, 2014, 9:58pm

I have a TH3F with 768 channels on a side. I create the histogram, iterate through a tree placing events in the tree that meet certain criteria into the histogram. At the end of the program, I write the histogram and flush the file.

... TH3F->Write("histName"); file->Flush(); return 0;

At some point in these last few lines of code I get the error message:

Some searching yields that root cannot handle single objects larger than 1GB and a little bit of basic math shows that (assuming nothing other than the array of channels for the histogram) the histogram is about 1.63 GB. Further, the root file that the TH3F is written to is 565MB, not the ~1.6 GB I was expecting.

My question is this: Is there any way around this limit? Is there any way to have a histogram this large saved to a file and retrievable?

Danilo · August 2, 2014, 5:10am

Hi,

not knowing the kind of data reduction procedure you are trying to put in place (a 700^3 histogram is really huge), I think my proposal could be twofold: try out a TH3S or/and a THnSparse.

jmatta1 · August 2, 2014, 2:17pm

It’s gamma-ray coincidence data, with 768 channels I have already dropped my resolution to 2keV / channel. I am using root to handle the aspects of the analysis where I need to make asymmetric data structures (i.e. any HPGE vs HPGEs close to 90 deg vs HPGEs at forward and backward angles). Unfortunately, with a TH3S I would flood the individual bins of the histogram by ~1.5 orders of magnitude.

So it comes down the THnSparse, if the data is not actually sparse, will it give the same problems? Or is the way how it stores things internally different enough that nothing will explode even if the data fills the entire region with no gaps?

Danilo · August 2, 2014, 3:11pm

I think that the sparse could be given a try. Of course if the bins are all filled, there is no chance of gaining much. Is it an option to save 3 TH2Fs (x1-x2,-x1-x3,x2-x3)?

jmatta1 · August 2, 2014, 6:39pm

Unfortunately, doing the 3 TH2F approach would kill the correllations that I need to look at. I am gating (projecting) on the “any angle” axis to clean away some background, gating on the “~90 degree” axis to set to set the preceding transition for the gamma-gamma correlation, and then using the 1D histogram that results which is “F/B angle” to find the number of counts in the chosen peak. Then I repeat that with the same gates except the second gate is on the “F/B angle” and the examined histogram is the “~90 degree”.

My question, about the THnSparse, was more along the lines of does it handle itself such that it would not try to put itself all in 1 basket and thus cause the error.

I suppose, if all else fails, I can make 768 TH2Fs, one for each channel on the third axis of the cube, and then write my code to select the appropriate ones based on the bin numbers of the “any angle” gate. Then for each of the bins in the range of that first gate, it would get the gates from each of the TH2Fs in that range and add them together for the final histogram… that feels like a poor solution though.

Danilo · August 2, 2014, 7:13pm

Hi,

thanks for taking the time to give so many clarifications about your work.
Something which one can always do would also be to save the information on a file using a datastructure like the TNtuple/simple TTree and use it to fill the tridimensional histogram on the fly in memory. The whole operation can be nicely encapsulated and it’s going to be rather fast. You will also gain quite a lot of space given the compression capabilities of ROOT.

Cheers,
Danilo

jmatta1 · August 5, 2014, 3:52pm

Sorry for taking so long to reply, I have been caught up other work.

I had thought about filling a second tree from the first, but I am a bit curious about the space savings and speed of filling histograms on the fly.

My first tree has 3 branches, “Mult/b”, “En[Mult]/F”, and “DetNum[Mult]/b”. It has about 3.47 x 10^9 tuples in it and spans about 92.6 GB across 50 files (I forced splits every 2x10^9 B since things seemed to go wonky when the individual files got to more than ~10GB). My second tree would have 3 branches “AnyDet/F”, “FB_Det/F”, and “90_Det/F”. Because of the way events are processed (an event with multiplicity > 3 becomes numerous multiplicity=3 events), this new tree would have on the order of 10^10 events, despite the intensity reductions from angle requirements.

While, due to the tuples being of fixed length, this tree would probably be smaller in file size (compared to the original tree), is there any way that it could be smaller than the histogram size? Additionally, with that many tuples in the tree, would filling histograms from the tree on the fly be at all quick?