Large memory leak in THnSparse

zubov_dmitriy_97 · February 12, 2021, 7:11pm

Dear experts,
I am having huge memory leak while using N-dimensional histograms with THnSparse.
My code contains function and its input - two not empty 9-dimensional histograms and this number of bins for each dimension - {8, 12, 8, 8, 9, 9, 9, 9, 5}. I don’t see any memory leak in code, but when I run it 7 GB of my RAM dies before the function is over.
Thanks,
Dmitriy

void FillCDF(THnSparseF* H_sig, THnSparseF* H_bkg){
Float_t B = 0, S = 0;
Int_t* p0 = new Int_t[9];
Int_t* p1 = new Int_t[9];

for(Int_t binI = H_sig->GetAxis(0)->GetNbins(); binI>=1; --binI){ 
    for(Int_t binJ=H_sig->GetAxis(1)->GetNbins(); binJ>=1; --binJ){
		for(Int_t binK=H_sig->GetAxis(2)->GetNbins(); binK>=1; --binK){
			for(Int_t binL=H_sig->GetAxis(3)->GetNbins(); binL>=1; --binL){
				for(Int_t binM=1; binM<=H_sig->GetAxis(4)->GetNbins(); ++binM){
					for(Int_t binN=H_sig->GetAxis(5)->GetNbins(); binN>=1; --binN){
						for(Int_t binO=1; binO<=H_sig->GetAxis(6)->GetNbins(); ++binO){
							for(Int_t binP=H_sig->GetAxis(7)->GetNbins(); binP>=1; --binP){
								for(Int_t binR=1; binR<=H_sig->GetAxis(8)->GetNbins(); ++binR){

			p0[0] = binI;
			p0[1] = binJ;
			p0[2] = binK;
			p0[3] = binL;
			p0[4] = binM;
			p0[5] = binN;
			p0[6] = binO;
			p0[7] = binP;
			p0[8] = binR;
			B = H_bkg->GetBinContent(p0);
			S = H_sig->GetBinContent(p0);
			for(Int_t I=0; I<=1; I++){
			    for(Int_t J=0; J<=1; J++){
					for(Int_t K=0; K<=1; K++){
						for(Int_t L=0; L<=1; L++){
							for(Int_t M=0; M<=1; M++){
								for(Int_t N=0; N<=1; N++){
									for(Int_t O=0; O<=1; O++){
										for(Int_t P=0; P<=1; P++){
											for(Int_t R=0; R<=1; R++){
					    if(I+J+K+L+M+N+O+P+R>0){
						p1[0] = binI+I;
						p1[1] = binJ+J;
						p1[2] = binK+K;
						p1[3] = binL+L;
						p1[4] = binM-M;
						p1[5] = binN+N;
						p1[6] = binO-O;
						p1[7] = binP+P;
						p1[8] = binR-R;
						B=B-pow(-1, I+J+K+L+M+N+O+P+R)*H_bkg->GetBinContent(p1);
						S=S-pow(-1, I+J+K+L+M+N+O+P+R)*H_sig->GetBinContent(p1);
						}}}}}}}}}}
		
			H_sig->SetBinContent(p0, S);
			H_bkg->SetBinContent(p0, B);
		
}}}}}}}}}}

ROOT Version: ROOT 6.18/04

Wile_E_Coyote · February 13, 2021, 1:26pm

It seems to me that you actually fill / use every single bin of your histograms.
In this case, you should switch to: THn

zubov_dmitriy_97 · February 13, 2021, 2:10pm

Thanks for your reply!
I just switch to THn and get this error: Error in TRint::HandleTermInput(): std::bad_alloc caught: std::bad_alloc.
Could this mean that histograms are too big and they occupy all RAM?

Wile_E_Coyote · February 13, 2021, 2:38pm

I’m afraid, a THnF will use more than 5GB RAM ("4." = “Float_t”, “+2” = “under/over-flow bins”):
4.*(8+2)*(12+2)*(8+2)*(8+2)*(9+2)*(9+2)*(9+2)*(9+2)*(5+2)/1024./1024./1024.

So, maybe a well compacted THnSparseF could still be better. If there are no under/over-flow bins, you could end with something like 1GB RAM ("4." = “Float_t”):
4.*8*12*8*8*9*9*9*9*5/1024./1024./1024.
I’m afraid you would really need to manually tune the “internal representation” when creating / filling your histogram so that it uses as little RAM as possible (when all bins, except under/over-flows, are used). Unfortunately, I know no place where you can find any hints on how to do it. Maybe @Axel or @pcanal, or @moneta could help.

zubov_dmitriy_97 · February 14, 2021, 9:34am

I already tuned tuned my histograms. I changed binning of each axis and excluded the possibility of under/over-flowing of bins, but problem still remain.
Maybe I do something wrong with initialization or rebinning?

void FillPDF(vector<const char*> dirs, THnSparseF* H_sig, THnSparseF* H_bkg); 
void FillCDF(THnSparseF* H_sig, THnSparseF* H_bkg);

void main(){
vector<const char*> dirs;

dirs.push_back("/home/dmitriy/NewFolder.2/ZZ_ewk_llvvjj/");
dirs.push_back("/home/dmitriy/NewFolder.2/ZZ_qcd_llvvZZ/");
dirs.push_back("/home/dmitriy/NewFolder.2/Z_ee/");
dirs.push_back("/home/dmitriy/NewFolder.2/Z_mu_mu");
dirs.push_back("/home/dmitriy/NewFolder.2/Z_tau_tau/");
dirs.push_back("/home/dmitriy/NewFolder.2/WZ/");
dirs.push_back("/home/dmitriy/NewFolder.2/top/");
dirs.push_back("/home/dmitriy/NewFolder.2/WW/WWlvlv/");
dirs.push_back("/home/dmitriy/NewFolder.2/Wt/");
dirs.push_back("/home/dmitriy/NewFolder.2/VVV/");
dirs.push_back("/home/dmitriy/NewFolder.2/ttV_ttVV/");

Int_t bins[9] = {8, 12, 8, 8, 9, 9, 9, 9, 5};
Double_t xmin[9] = {0, 0., 10, 20, -10, -10, -10, -10, 0}; 
Double_t xmax[9] = {50, 7000, 2000, 2000, 10, 10, 10, 10, 1};

THnSparseF *PDF_S = new THnSparseF("hs", "hs", 9, bins, xmin, xmax);
THnSparseF *PDF_B = new THnSparseF("hs", "hs", 9, bins, xmin, xmax);

// Arrays of bins for each axis. First and last elements of arrays are expanded to avoid under/over-flows.
Float_t MET_signif_bins[9] = {-1000, 7, 8, 9, 10, 11, 12, 13, 5000};
Float_t Mjj_bins[13] = {-1000000, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 7000000};
Float_t PT_Lj[9] = {-1000000, 50, 60, 70, 80, 90, 100, 110, 2000000};
Float_t PT_SLj[9] = { -2000000, 30, 40, 50, 60, 70, 80, 90,2000000};
Float_t dY_Zj_max[10] = {-10000, -0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 10000};
Float_t dY_Zj_min[10] = {-10000, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9, 10000};
Float_t dEta_lj_max[10] = {-10000, -0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 10000};
Float_t dEta_lj_min[10] = {-10000, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9, 10000};
Float_t Pt_b[6] = {-1000, 0.1, 0.2, 0.3, 0.4, 1000};

PDF_S->GetAxis(0)->Set(8, MET_signif_bins);
PDF_S->GetAxis(1)->Set(12, Mjj_bins);
PDF_S->GetAxis(2)->Set(8, PT_Lj);
PDF_S->GetAxis(3)->Set(8, PT_SLj);
PDF_S->GetAxis(4)->Set(9, dY_Zj_max);
PDF_S->GetAxis(5)->Set(9, dY_Zj_min);
PDF_S->GetAxis(6)->Set(9, dEta_lj_max);
PDF_S->GetAxis(7)->Set(9, dEta_lj_min);
PDF_S->GetAxis(8)->Set(5, Pt_b);

PDF_B->GetAxis(0)->Set(8, MET_signif_bins);
PDF_B->GetAxis(1)->Set(12, Mjj_bins);
PDF_B->GetAxis(2)->Set(8, PT_Lj);
PDF_B->GetAxis(3)->Set(8, PT_SLj);
PDF_B->GetAxis(4)->Set(9, dY_Zj_max);
PDF_B->GetAxis(5)->Set(9, dY_Zj_min);
PDF_B->GetAxis(6)->Set(9, dEta_lj_max);
PDF_B->GetAxis(7)->Set(9, dEta_lj_min);
PDF_B->GetAxis(8)->Set(5, Pt_b);

FillPDF(dirs, PDF_S, PDF_B); //input - dirs with root files and empty histograms

FillCDF(PDF_S, PDF_B); // input - not empty histograms. Memoty is leaking during this process.
//Function FillCDF() is attached in previous message.
}

Wile_E_Coyote · February 14, 2021, 10:18am

If in the “FillPDF” function you are filling only some of the bins, then in the “FillCDF” function you could try:

// fill bins ONLY if they already exist (do not allocate new bins)
if (H_sig->GetBin(p0, kFALSE) >= 0) H_sig->SetBinContent(p0, S);
if (H_bkg->GetBin(p0, kFALSE) >= 0) H_bkg->SetBinContent(p0, B);

You could also try another “FillCDF” function:

#include "THnSparse.h"
#include <cmath>

void FillCDF(THnSparseF *H_sig, THnSparseF *H_bkg) {
  Float_t V;
  Int_t p0[9];
  Int_t p1[9];
  
  // for(Long64_t bin = 0; bin < H_sig->GetNbins(); bin++) { // forwards
  for(Long64_t bin = H_sig->GetNbins() -1; bin >= 0; bin--) { // backwards
    V = H_sig->GetBinContent(bin, p0);
    for(Int_t I=0; I<=1; I++) {
      for(Int_t J=0; J<=1; J++) {
        for(Int_t K=0; K<=1; K++) {
          for(Int_t L=0; L<=1; L++) {
            for(Int_t M=0; M<=1; M++) {
              for(Int_t N=0; N<=1; N++) {
                for(Int_t O=0; O<=1; O++) {
                  for(Int_t P=0; P<=1; P++) {
                    for(Int_t R=0; R<=1; R++) {
                      if(I+J+K+L+M+N+O+P+R>0) {
                        p1[0] = p0[0] + I;
                        p1[1] = p0[1] + J;
                        p1[2] = p0[2] + K;
                        p1[3] = p0[3] + L;
                        p1[4] = p0[4] - M;
                        p1[5] = p0[5] + N;
                        p1[6] = p0[6] - O;
                        p1[7] = p0[7] + P;
                        p1[8] = p0[8] - R;
                        V -= std::pow(-1, I+J+K+L+M+N+O+P+R) * H_sig->GetBinContent(p1);
                      }}}}}}}}}}
    H_sig->SetBinContent(bin, V); // (bin, V) ... or ... (p0, V)
  }
  
  // for(Long64_t bin = 0; bin < H_bkg->GetNbins(); bin++) { // forwards
  for(Long64_t bin = H_bkg->GetNbins() -1; bin >= 0; bin--) { // backwards
    V = H_bkg->GetBinContent(bin, p0);
    for(Int_t I=0; I<=1; I++) {
      for(Int_t J=0; J<=1; J++) {
        for(Int_t K=0; K<=1; K++) {
          for(Int_t L=0; L<=1; L++) {
            for(Int_t M=0; M<=1; M++) {
              for(Int_t N=0; N<=1; N++) {
                for(Int_t O=0; O<=1; O++) {
                  for(Int_t P=0; P<=1; P++) {
                    for(Int_t R=0; R<=1; R++) {
                      if(I+J+K+L+M+N+O+P+R>0) {
                        p1[0] = p0[0] + I;
                        p1[1] = p0[1] + J;
                        p1[2] = p0[2] + K;
                        p1[3] = p0[3] + L;
                        p1[4] = p0[4] - M;
                        p1[5] = p0[5] + N;
                        p1[6] = p0[6] - O;
                        p1[7] = p0[7] + P;
                        p1[8] = p0[8] - R;
                        V -= std::pow(-1, I+J+K+L+M+N+O+P+R) * H_bkg->GetBinContent(p1);
                      }}}}}}}}}}
    H_bkg->SetBinContent(bin, V); // (bin, V) ... or ... (p0, V)
  }
}

BTW. In any case, you are modifying both histograms “in-place”, so the final results depend on the order in which their bins are modified.

zubov_dmitriy_97 · February 14, 2021, 12:42pm

Your code works faster and without memory leaks!
Thanks a lot!

Wile_E_Coyote · February 14, 2021, 12:49pm

In the original “FillCDF” function you could also try:

// fill bins ONLY if their contents needs to be changed
// (it may allocate some new bins, only if the new contents is non-zero)
if (H_sig->GetBinContent(p0) != S) H_sig->SetBinContent(p0, S);
if (H_bkg->GetBinContent(p0) != B) H_bkg->SetBinContent(p0, B);

BTW. You should also (in the end): delete [] p0; delete [] p1;

system · February 28, 2021, 12:50pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.