Problems using THnSparse Projection

Javier_Galan · July 26, 2023, 10:02am

Hi there!

I am using the following macro, where I am accessing a RDataFrame to generate histograms of final_posX and final_posY column names by different methods. When I draw using df.Histo1D, THnSparse::Draw or THnSparse::Projection(0,1)->Draw() I get different results.

C-macro

Int_t test( std::string fname )
{
    TRestDataSet dS;
    dS.Import(fname);
    dS.PrintMetadata();

    auto myHist1 = dS.GetDataFrame().Histo2D({"histName", "histTitle", 80u, -20, 20, 80u, -20., 20.}, "final_posX", "final_posY");

    TCanvas cv;
    myHist1->Draw();
    cv.Print("test.png");

    Int_t* bins = new Int_t[3];
    Double_t* xmin = new Double_t[3];
    Double_t* xmax = new Double_t[3];

    for (size_t n = 0; n < 2; n++) {
        bins[n] = 80;
        xmin[n] = -20;
        xmax[n] = 20;
    }
    bins[2] = 100;
    xmin[2] = 0;
    xmax[2] = 10;

    THnSparseD* sparse = new THnSparseD("sparse", "sparse", 3, bins, xmin, xmax);

    std::vector<std::vector<double> > data;
    auto parValues1 = dS.GetDataFrame().Take<double>("final_posX");
    data.push_back(*parValues1);
    auto parValues2 = dS.GetDataFrame().Take<double>("final_posY");
    data.push_back(*parValues2);
    auto parValues3 = dS.GetDataFrame().Take<double>("final_energy");
    data.push_back(*parValues3);

    Double_t* values = new Double_t[3];
    if (!data.empty())
        for (size_t m = 0; m < data[0].size(); m++)
        {
            for (size_t v = 0; v < 3; v++) {
                values[v] = data[v][m];
            }

            sparse->Fill(values);
        }
    delete[] values;

    sparse->Draw();
    cv.Print("sparse.png");

    TH2D *h2 = sparse->Projection(0,1);
    h2->Draw();
    cv.Print("h2.png");

    return 0;
}

Using RDataFrame::Histo1D I get

Using the filled THnSparse object I get

But using the Projection(0,1) which I expect it gets a projection on the final_posX and final_posY plane, I get the following:

Do you know where I got it wrong?

Thank you!

Javier_Galan · July 27, 2023, 8:28am

Another question I got about THnSparse. If in the previous script I call sparse->GetNbins() I get the value: 649727, while in a traditional histogram I would get: 80x80x100=640000.

I thought THnSparse only reserves memory for those bins that got any content different from zero. So, why the number of bins returned is higher?

EDIT: Although if I increase the binning to 800x800x1000, I get 38,205,432 which is lower than traditional 640,000,000

Javier_Galan · July 27, 2023, 10:26am

I have generated 2 files with the translated coordinates to user ranges (-20,-20,0) → (20,20,10), and dumped the contents to TXT file.

The first file I generated using THnD → https://sultan.unizar.es/exchange/THnD.txt

The second file I generated using THnSparseD → https://sultan.unizar.es/exchange/sparseD.txt

When I look to the recovered the bin contents (which were generated the same way for THnD and THnSparse, I just change the object type) I see that the results are not similar. (**)

Once the histograms have been filled I use the following code to generate the files:

    Int_t* idx = new Int_t[3];
    FILE *f = fopen("sparseD.txt", "wt" );
    for( int n = 0; n < sparse->GetNbins(); n++ )
    {
        Double_t v = sparse->GetBinContent(n, idx);
            
        if( idx[0] == 0 || idx[1] == 0 || idx[2] == 0 )
            continue;
        if( idx[0] == sparse->GetAxis(0)->GetNbins()+1 || idx[1] == sparse->GetAxis(1)->GetNbins()+1 || idx[2] == sparse->GetAxis(2)->GetNbins()+1 )
            continue;

        for( size_t i = 0; i < 3; i++ )
            values[i] = xmin[i] + (xmax[i]-xmin[i])*((double)idx[i]-0.5)/bins[i];

        fprintf( f, "%lf\t%lf\t%lf\t%lf\n", values[0], values[1], values[2], v );
    }
    fclose(f);

I use the following to get the bin center: values[i] = xmin[i] + (xmax[i]-xmin[i])*((double)idx[i]-0.5)/bins[i]; since I see no GetBinCenter method as in TH1 based classes.

(**) For example, I get -19.750000 -19.750000 4.050000 5.000000 for sparse type, and -19.750000 -19.750000 4.050000 0.000000 for THnD type.

Javier_Galan · July 27, 2023, 12:30pm

Eventhough I am more confident with the bin contents returned by THnD object, the projection seems to have a similar problem.

bellenot · August 8, 2023, 6:50am

Sorry for the very long delay… And maybe @moneta can give some info about THnSparse

Javier_Galan · August 8, 2023, 9:32am

Thanks for the reply!

My guess is that is more a problem related to the Projection routine than THnSparse itself.

Would be great to clarify this issue.

Javier_Galan · August 8, 2023, 11:18am

Ok, if I create the histogram using the technique indicated by @mczurylo at the following post I see no problem in the projection.

So the problem seems to be related to retrieving the column data using Take. This post should be probably closed since the origin of the problem is more connected to the mentioned post.

system · August 22, 2023, 11:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.