THnSparse Mystery

cmclauchlin · June 27, 2023, 1:12pm

I have a 7-dimensional THnSparseD Histogram (Friend) which I make in one program, output, and then read into another program to analyze. During this later analysis, I’ve consistently run into issues where the structures present are far from what I would expect, including every other bin empty for certain dimensions. In an effort to try and isolate where the issue is taking place, I went to where I’m filling the histogram:

void Histogram::Friend_Fill(const char* top_, float W_, float Q2_, float MM_, float MM2_, float theta_, float alpha_, float phi_ , int var_, bool thrown_, float weight_, int helicity_, float plus_weight_, std::shared_ptr<Flags> flags_){
       if(!std::isnan(W_) && !std::isnan(Q2_) && !std::isnan(MM_) && !std::isnan(MM2_) && !std::isnan(theta_) && !std::isnan(alpha_) && !std::isnan(phi_) && !std::isnan(weight_)){
            if(Histogram::OK_Idx(Histogram::Friend_idx(W_,Q2_,MM_,MM2_,theta_,alpha_,phi_,var_))){
                 double x;
                 x[0] = (double)W_;
	         x[1] = (double)Q2_;
	         x[2] = (double)MM_;
	         x[3] = (double)MM2_;
	         x[4] = (double)theta_;
	         x[5] = (double)alpha_;
	         x[6] = (double)phi_;
                 TThread::Lock();
	         _Friend[var_][fun::top_idx(top_)]->Fill(x,weight_);
	         _MM1_Dist[var_][fun::top_idx(top_)]->Fill(MM_,weight_);
	         _MM2_Dist[var_][fun::top_idx(top_)]->Fill(MM2_,weight_);
                 _Theta_Dist[var_][fun::top_idx(top_)]->Fill(theta_,weight_);
	         _Alpha_Dist[var_][fun::top_idx(top_)]->Fill(alpha_,weight_);
	         _Phi_Dist[var_][fun::top_idx(top_)]->Fill(phi_,weight_);
                 TThread::UnLock();
          }
     }
}

I run this multithreaded, but one cannot fill THnSparse as such, so the TThread locks and unlocks allow it to happen by locking onto one thread at a time for the actual filling of the THnSparse.
The following histograms *_Dist are the 2nd-7th varable axes of the THnSparse. By filling them every time I fill the THnSparse, these should then be identical to a projection into any one of those axes by the full 7-d THnSparse. When writing these, I then do some additional projections to check this:

TH1D* check_7d[3][5][5];
char hname[100];
for(int i = 0; i <3; i++){//Variable Set
     for(int j = 0; j<5; j++){//Topology
           _Friend[i][j]->Write();
	   _MM1_Dist[i][j]->Write();
	   _MM2_Dist[i][j]->Write();
	   _Theta_Dist[i][j]->Write();
	   _Alpha_Dist[i][j]->Write();
	   _Phi_Dist[i][j]->Write();
	   for(int k=0; k<5; k++){
	          check_7d[i][j][k] = _Friend[i][j]->Projection(k+2,"E");						   
                  sprintf(hname,"2#pi_off_proton_%s_%s_%s",_var_names_[i],_top_[j],_friend_pars_[2+k]);
		  check_7d[i][j][k]->SetNameTitle(hname,hname);
		  check_7d[i][j][k]->Write();
	   }
     }
}

As it turns out, the *_Dist 1-d histogram and their corresponding projection in the check_7d plots are, in fact, identical! However, when looking at the projection inside the THnSparse in the root file using TBrowser, we can see that the projection is already divergent and broken despite it having been written before even being projected to the expected representation! It is this divergent behavior which is consistent with what I’ve seen in the later analysis of it. I’ll try to upload some comparison plots

First looking at MM1, here is the MM_Dist histogram

The corresponding check_7d projection

The projection of the THnSparse as seen in TBrowser after it has been written

As another example, here is the Alpha_Dist histogram

The corresponding check_7d projection

As Alpha shows up in the TBrowser

What is fascinating is the first two axes actually look like how I would expect

and the MM histogram looks like it has the same values, but they’ve just had empty bins placed in-between them so it’s spaced out beyond the bounds of the histogram, while the Alpha has seemingly no correlation.

If anyone has any insight into this mystery please let me know. I have no idea what is going on.

_ROOT Version:_6.26.04
Platform: macOS Ventura 13.0
Compiler: g++

cmclauchlin · June 27, 2023, 2:49pm

Update:
I’ve done the same 1-d checks with W and Q2 and found some even greater mysteries.

When doing W, the first axis, everything actually lines up properly.
The manual W_Dist plot

The projection of W

The projection of W inside the TBrowser of the saved 7d histogram

As you can see, these all line up and appear to be in agreement. We love that!

Now the weirdest part.
Q2
The manual Q2_Dist

The check_7d plot for Q2

The projection of Q2 in the TBrowser of the 7d histogram

There appears to be an actual difference between the check-7d plot and the Q2_Dist, while the projection inside the THnSparse through TBrowser is actually identical to the check-7d plot for Q2.
I have no idea how this one could be different from all the others and I’m just… lost.

cmclauchlin · June 27, 2023, 4:36pm

Update: I forgot to include the non-uniform binning in the Q2_Dist histogram, whereas it’s preserved in the others. The first two axes appears to be consistent across all three

mczurylo · June 29, 2023, 3:22pm

Hi @cmclauchlin,

thank you for your question and explanation of the problem. Maybe @pcanal could help you with this?

Cheers,
Marta

cmclauchlin · July 26, 2023, 3:10pm

Update:
While I don’t know the exact issue, I think the issue may lie in the memory allocation to the THnSparse object within the TFile and that this allocation is different when compared to working and making it separately within a program. Upon splitting my 7-dimensional THnSparse into an equivalent 145 5-dimensional THnSparse where each one corresponds to a specific bin combination of the first two dimensions from the 7-d everything works. This ends up taking up more memory in total within the TFile, but the memory for each individual THnSparse is smaller. I haven’t done any testing to see where this limit might be and don’t fully know if it’s memory or dimension related, but that’s the fix I have for now.