I am using the following macro, where I am accessing a RDataFrame to generate histograms of final_posX and final_posY column names by different methods. When I draw using df.Histo1D, THnSparse::Draw or THnSparse::Projection(0,1)->Draw() I get different results.
C-macro
Int_t test( std::string fname )
{
TRestDataSet dS;
dS.Import(fname);
dS.PrintMetadata();
auto myHist1 = dS.GetDataFrame().Histo2D({"histName", "histTitle", 80u, -20, 20, 80u, -20., 20.}, "final_posX", "final_posY");
TCanvas cv;
myHist1->Draw();
cv.Print("test.png");
Int_t* bins = new Int_t[3];
Double_t* xmin = new Double_t[3];
Double_t* xmax = new Double_t[3];
for (size_t n = 0; n < 2; n++) {
bins[n] = 80;
xmin[n] = -20;
xmax[n] = 20;
}
bins[2] = 100;
xmin[2] = 0;
xmax[2] = 10;
THnSparseD* sparse = new THnSparseD("sparse", "sparse", 3, bins, xmin, xmax);
std::vector<std::vector<double> > data;
auto parValues1 = dS.GetDataFrame().Take<double>("final_posX");
data.push_back(*parValues1);
auto parValues2 = dS.GetDataFrame().Take<double>("final_posY");
data.push_back(*parValues2);
auto parValues3 = dS.GetDataFrame().Take<double>("final_energy");
data.push_back(*parValues3);
Double_t* values = new Double_t[3];
if (!data.empty())
for (size_t m = 0; m < data[0].size(); m++)
{
for (size_t v = 0; v < 3; v++) {
values[v] = data[v][m];
}
sparse->Fill(values);
}
delete[] values;
sparse->Draw();
cv.Print("sparse.png");
TH2D *h2 = sparse->Projection(0,1);
h2->Draw();
cv.Print("h2.png");
return 0;
}
Another question I got about THnSparse. If in the previous script I call sparse->GetNbins() I get the value: 649727, while in a traditional histogram I would get: 80x80x100=640000.
I thought THnSparse only reserves memory for those bins that got any content different from zero. So, why the number of bins returned is higher?
EDIT: Although if I increase the binning to 800x800x1000, I get 38,205,432 which is lower than traditional 640,000,000
When I look to the recovered the bin contents (which were generated the same way for THnD and THnSparse, I just change the object type) I see that the results are not similar. (**)
Once the histograms have been filled I use the following code to generate the files:
Int_t* idx = new Int_t[3];
FILE *f = fopen("sparseD.txt", "wt" );
for( int n = 0; n < sparse->GetNbins(); n++ )
{
Double_t v = sparse->GetBinContent(n, idx);
if( idx[0] == 0 || idx[1] == 0 || idx[2] == 0 )
continue;
if( idx[0] == sparse->GetAxis(0)->GetNbins()+1 || idx[1] == sparse->GetAxis(1)->GetNbins()+1 || idx[2] == sparse->GetAxis(2)->GetNbins()+1 )
continue;
for( size_t i = 0; i < 3; i++ )
values[i] = xmin[i] + (xmax[i]-xmin[i])*((double)idx[i]-0.5)/bins[i];
fprintf( f, "%lf\t%lf\t%lf\t%lf\n", values[0], values[1], values[2], v );
}
fclose(f);
I use the following to get the bin center: values[i] = xmin[i] + (xmax[i]-xmin[i])*((double)idx[i]-0.5)/bins[i]; since I see no GetBinCenter method as in TH1 based classes.
(**) For example, I get -19.750000 -19.750000 4.050000 5.000000 for sparse type, and -19.750000 -19.750000 4.050000 0.000000 for THnD type.
Ok, if I create the histogram using the technique indicated by @mczurylo at the following post I see no problem in the projection.
So the problem seems to be related to retrieving the column data using Take. This post should be probably closed since the origin of the problem is more connected to the mentioned post.