How to handle THnSparse larger than 500Mo

dudouet · October 24, 2018, 4:17pm

Hi Philippe,

What do you call a split TTree ? Does it will allow me to make the same kind of projections than with a THnSparse ?

I ask my question again, is it only a problem of the TFile size ? If I store on disk a huge binary file and I then store it in a THnSparse in RAM memory, and not in a file, will it works ?

Thanks in advance

Jérémie

pcanal · October 24, 2018, 4:37pm

If I store on disk a huge binary file and I then store it in a THnSparse in RAM memory, and not in a file, will it works ?

Yes, and using a TTree is an efficient way of storing the data in a ‘huge binary file’, you would be able to reconstruct the THnSparse at read time and you may be able to read the data in other ways too. (See the description of TTree in the User’s Guide and the new RDataFrame analysis tool).

Cheers,
Philippe.

dudouet · October 24, 2018, 5:51pm

Hum I was not aware of this new RDataFrame analysis tool. It seems very interesting… Do you know if examples are available to help me to store data from a THnSpare in a TTree, and to build a THnSparse from a TTree using this analysis tool ?

Wile_E_Coyote · October 24, 2018, 5:55pm

dudouet · October 24, 2018, 9:43pm

Hi,

Using the trick you propose:

  TTree *t = new TTree("tree3D", "tree with his3D");
  t->Branch("his3D.", his3D, 32000, 111); // "max" splitting?
  t->Fill();
  t->Write();
  delete t;
#endif /* 0 or 1 */
  f->Write();
  delete f;

I still obtain at the end the error:

Error in TBufferFile::WriteByteCount: bytecount too large (more than 1073741822)

I think I need to build by hand a tree, and not storing the THnSparse directly in the TTree

Wile_E_Coyote · October 25, 2018, 11:05am

I’m afraid @pcanal would need to tell if it is possible to get a “better” branch splitting (I was trying to impose the “max” one).

Well, you could implement the following “brutal fix”.

As I understand, your whole experimental data “sample” produces a THnSparse which is too big to be stored in a ROOT file.

So, try to “split” / “divide” your whole experimental data “sample” into several “subsamples” (or even several tens of “subsamples”, if needed).

Each “subsample” would then (possibly / hopefully) produce a much smaller “partial” THnSparse and you should be able to store these “partial” histograms in a ROOT file, either directly as separate objects or in a TTree. You could create a single ROOT file with all “partial” histograms or one ROOT file per “partial” histogram.

So, if your raw experimental data are spread across multiple files, you could take each raw experimental data file as one physical “subsample” or, if you have just one single file with all raw experimental data, simply divide the total number of events by some number and create that many logical “subsamples” (or one “subsample” per an hour or a day or a week of measurements).

Another (quite clever) way to split your data into “subsamples” would be to monitor the actual total number of bins of your THnSparse, when you fill it (THnSparse::GetNbins). Once this number reaches a certain maximum value (defined by you, it should be small enough that you can still save this histogram in a ROOT file, let’s say 10 to 50 million bins could be fine, I guess), you simply write the current “partial” THnSparse histogram to a ROOT file, then you recreate the THnSparse histogram (or reset it so that all previous bins are gone) and continue the filling with this next “partial” histogram.

For test purposes, I created some 4096x4096x4096 3 dimensional THnSparse histograms and I filled them in a random way (with random values). I have found that the average TFile buffer/basket size needed by such histograms can easily be estimated as follows. For histograms filled without weights one needs “number_of_filled_bins * (sizeof(bin) + 5)”, while for histograms for which THnSparse::Sumw2() has been called one needs “number_of_filled_bins * (sizeof(bin) + 13)” (i.e. weights are always “Double_t”), where the “sizeof(bin)” is 8 for “Double_t” and “Long_t” and 4 for “Float_t” and “Int_t” and the “number_of_filled_bins” is given by THnSparse::GetNbins().

Then, you just need a simple small ROOT macro, which reads / retrieves all “partial” histograms (from a single or from many ROOT files) and adds them in RAM. Well, you will always need to run this macro at the beginning of your ROOT session, of course … but that should really be very fast.

pcanal · October 25, 2018, 2:29pm

Alternatively (and I can not be precise not knowing your data flow and data layout), where you do

double some_var = ....;
double some_value = ....;
for( some condition ) {
    some_var = ....;
    some_value = ....;
    sparseHisto->Fill( some_var, somevalue);
}

do

double some_var = ....;
double some_value = ....;
tree->Branch("some_var",&some_var);
tree->Branch("some_value",&some_value);
for( some condition ) {
    some_var = ....;
    some_value = ....;
    tree->Fill();
}

then when reading the file use RDataFrame (or MakeSelector or other way of looping through the TTree) to recreate the THnSparse.

Cheers,
Philippe.

dudouet · October 25, 2018, 2:37pm

Thank you for your help. Anyway, all of these solutions are not very user friendly. Doing a simple projection will takes time and it needs to be done in few seconds to be competitive with the old softwares that handle this kind of cube. I don’t understand how they did to handle cubes up to 8192x8192x8192, in files of few GB, and doing projections in around one second…

When I want to plot a projection of gamma-gamma-gamma coincidence (the 1d spectrum of gamma rays which are in coincidence with two other ones giving specific energy ranges), I need the full statistics of my experiment. If I need to do that on many subfiles and then sum them it will be a nightmare.

Wile_E_Coyote · October 25, 2018, 3:15pm

Once you have the total THnSparse in RAM (either by summing up “partial” histograms or by creating it directly from a TTree), you can make as many projections as you want (no need to recreate / refill the THnSparse if you do not exit your ROOT session).

system · November 8, 2018, 3:21pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.