Problem of efficiency in transformation from TH3 to THnSparse

Hi,
I am trying to score a dose grid which has a dimension of 600x600x600. However, ROOT can’t save this histogram successfully due to the 1GB IO limitation. Therefore I need to transfer this histogram from TH3D to THnSparseD. Here is my problem, when the GetSparseFractionBins() return a small value, the time consuming is reasonable. When the SparseFractionBins gets bigger, the time consuming gets exponential longer.

The code of transformation looks like this:

std::unique_ptr<THnSparse> CopyTH3ToHnSparse(const TH3* h3)
{
    int iBins[3];
    double dMin[3];
    double dMax[3];
    std::vector<const TAxis*> vAxis = { h3->GetXaxis(), h3->GetYaxis(), h3->GetZaxis() };
    for (int i = 0; i < 3; i++) {
        iBins[i] = vAxis[i]->GetNbins();
        dMin[i] = vAxis[i]->GetXmin();
        dMax[i] = vAxis[i]->GetXmax();
    }
    std::unique_ptr<THnSparse> hnsp(new THnSparseD(h3->GetName(), h3->GetTitle(), 3, iBins, dMin, dMax));

    long nbins = h3->GetNcells();
    int index[3] = { 0, 0, 0 };
    for (int i = 0; i < nbins; i++)
    {
        double value = h3->GetBinContent(i);
        double error = h3->GetBinError(i);
        if (!value && !error)continue;
        h3->GetBinXYZ(i, index[0], index[1], index[2]);
        //hnsp->Fill(index[0], index[1], index[2], value);
        hnsp->SetBinContent(index, value);
        hnsp->SetBinError(index, value);
    }
    hnsp->SetEntries(h3->GetEntries());
    return hnsp;
}

I guess a lot of time is spending on finding index in THnSparse, which is unnecessary. Is there any way to improve the efficiency of this code?

ROOT Version: 6.34.00
Built for linuxx8664gcc on Nov 29 2024, 05:53:56
From tags/v6-34-00@v6-34-00

Hello @coffeezfw,

thank you for raising this interesting question. Before deep diving into performance optimization of your code, which might be non-trivial, have you considered to generate and fill a THnSparse instead of generating a TH3 and then convert between the two classes? This would be indeed more efficient.

Cheers,
Monica

Hello @mdessole,
Thank you for your advice. Here is the my situation, the dose grid is THnSparse at the beginning which is scored by Geant4. However, once the events number get bigger, the efficiency of indexing in the THnSparse gets lower. So I generated a lot of THnSparse with small SparseFractionBins. When sum up all these THnSparse, the efficiency to store the sum result gets much lower by using THnSparse than using TH3D, since TH3D indexing is much easier. So I use TH3D to store the sum result, but this TH3D result can’t be saved in ROOT file. Therefore I need to convert this TH3D into THnSparse.

Maybe there are some misuse in my application. All suggestions and options are welcome.

Thank you!

At a first glance, there are some simple changes you can implement:

if (!value && !error) continue;
        auto linindex = GetBin(index, true); // finds linear index corresponding to multi-dim index and it allocates space if it does not exists
        hnsp->SetBinContent(linindex, value);
        hnsp->SetBinError(linindex, error);
    }

In this way, the linear index is computed only once

Thank you, this helps a lot. Theoretically, this change saves at least half of the time. However, the time consuming is still negligible. Temporarily I save the TH3D as binary array which is much faster. It would be perfect if the THnSparse could add new bin without computing index. Is this possible?

The linear index should be computed at least once, in order to write the value in the corresponding bin. In order to better understand what’s taking so long in your code, you should profile it with tools such as perf, that might help.

Hello @mdessole,
This demo may reproduce my problem. With [UniformFill], time cost by [CopyTH3ToHnSparse] is always resonable no matter how many entries it fills. But for [GausFill], time cost get exponential longer when entries get more than 500x500x500.

#include <TH3.h>
#include <TRandom3.h>
#include <iostream>
#include <cmath>
#include <string>
#include <fstream>
#include <THnSparse.h>
#include <ctime>


std::unique_ptr<THnSparse> CopyTH3ToHnSparse(const TH3* h3)
{
    clock_t clkStart, clkStop;
    //time_t tStart, tStop;
    clkStart = clock();
    //tStart = time(NULL);

    int iBins[3];
    double dMin[3];
    double dMax[3];
    std::vector<const TAxis*> vAxis = { h3->GetXaxis(), h3->GetYaxis(), h3->GetZaxis() };
    for (int i = 0; i < 3; i++) {
        iBins[i] = vAxis[i]->GetNbins();
        dMin[i] = vAxis[i]->GetXmin();
        dMax[i] = vAxis[i]->GetXmax();
        //printf("Axis[%d]: Bin[%5d], Min[%.3f], Max[%.3f]\n", i, iBins[i], dMin[i], dMax[i]);
    }
    std::unique_ptr<THnSparse> hnsp(new THnSparseD(h3->GetName(), h3->GetTitle(), 3, iBins, dMin, dMax));

    long nbins = h3->GetNcells();
    int index[3] = { 0, 0, 0 };

    for (int i = 0; i < nbins; i++)
    {
        double value = h3->GetBinContent(i);
        double error = h3->GetBinError(i);
        if (!value && !error)continue;
        h3->GetBinXYZ(i, index[0], index[1], index[2]);
        long linindex = hnsp->GetBin(index, true);
        hnsp->SetBinContent(linindex, value);
        hnsp->SetBinError(linindex, error);
        //hnsp->SetBinContent(index, value);
        //hnsp->SetBinError(index, error);
    }
    hnsp->SetEntries(h3->GetEntries());

    clkStop = clock();
    //tStop = time(NULL);
    std::printf("[THnSparse]: Sparse Fraction of Bins: %.5f\n", hnsp->GetSparseFractionBins());
    std::printf("[THnSparse]: Sparse Fraction of Mem: %.5f\n", hnsp->GetSparseFractionMem());
    std::printf("[THnSparse]: CPU Time: %10.3f second(s). \n", double(clkStop - clkStart) / CLOCKS_PER_SEC);
    //std::printf("----Real Time: %10.3f second(s). \n", double(tStop - tStart));

    return hnsp;
}


void UniformFill(TH3D* h3, long nfill, TRandom3* random)
{
    clock_t clkFillStart, clkFillStop;

    clkFillStart = clock();

    for (long i = 0; i<nfill; i++)
    {
        auto x = random->Uniform(100.0) - 50.0;
        auto y = random->Uniform(100.0) - 50.0;
        auto z = random->Uniform(100.0) - 50.0;

        auto value = random->Uniform(10.0);

        h3->Fill(x, y, z, value);
    }
    clkFillStop = clock();
    std::printf("[Histogram Uniform Fill] Fill Count: %ld, CPU  Time: %10.3f second(s). \n", nfill, double(clkFillStop - clkFillStart) / CLOCKS_PER_SEC);
    return;
}

void GausFill(TH3D* h3, long nfill, TRandom3* random)
{
    clock_t clkFillStart, clkFillStop;

    clkFillStart = clock();

    for (long i = 0; i < nfill; i++)
    {
        int icasex = random->Integer(10);
        int icasey = random->Integer(10);
        int icasez = random->Integer(10);

        double dCenterx = (double(icasex) - 4.5) * 10.0;

        double dCentery = (double(icasey) - 4.5) * 10.0;

        double dCenterz = (double(icasez) - 4.5) * 10.0;

        double x = random->Gaus(dCenterx, 2.0);
        double y = random->Gaus(dCentery, 2.0);
        double z = random->Gaus(dCenterz, 2.0);

        double value = random->Uniform(10.0);

        h3->Fill(x, y, z, value);
    }
    clkFillStop = clock();
    std::printf("[Histogram Gaus Fill] Fill Count: %ld, CPU  Time: %10.3f second(s). \n", nfill, double(clkFillStop - clkFillStart) / CLOCKS_PER_SEC);
    return;
}


int main(int argc, char** argv)
{

	int iBins[3] = { 500, 500, 500 };
    double dMin[3] = { -50., -50., -50. };
    double dMax[3] = { 50., 50., 50. };

    TRandom3* random = new TRandom3();

    TH3D* h3 = new TH3D("TH3", "TH3",
        iBins[0], dMin[0], dMax[0],
        iBins[1], dMin[1], dMax[1],
        iBins[2], dMin[2], dMax[2]
    );
    long nfill1 = 100 * 100 * 100;
    long nfill2 = 500 * 500 * 500;
    printf("-->Uniform Fill: \n");
    
    UniformFill(h3, nfill1, random);

    std::unique_ptr<THnSparse> hnsp1 = CopyTH3ToHnSparse(h3);

    UniformFill(h3, nfill2, random);

    std::unique_ptr<THnSparse> hnsp2 = CopyTH3ToHnSparse(h3);
    delete h3;

    TH3D* h3g = new TH3D("TH3", "TH3",
        iBins[0], dMin[0], dMax[0],
        iBins[1], dMin[1], dMax[1],
        iBins[2], dMin[2], dMax[2]
    );
    printf("-->Gaus Fill: \n");
    GausFill(h3g, nfill1, random);
    std::unique_ptr<THnSparse> hnsp3 = CopyTH3ToHnSparse(h3g);
    GausFill(h3g, nfill2, random);
    std::unique_ptr<THnSparse> hnsp4 = CopyTH3ToHnSparse(h3g);
    delete h3g;
    delete random;

    return 0;
}

My local machine output:

This is the flame graph draw from perf record data
perf

I see that the sparse fraction of memory is > 1 when nfill = 500 * 500 * 500, which should’t be the case… I am investigating.

1 Like

There is no error in the memory estimate you get for your histograms. Indeed, the level nfill2 of fill-in in your examples is just too high too see any benefit from the use of THnSparse in its current implementation. And this is why the elapsed time increases so fast, because the memory consumption underneath is few times larger than its uncompressed THn counterpart.
Sorry to tell that I can’t see any easy solution to your situation…

1 Like

Thank you. There is one last question, why uniform filled TH3D cost much less time to convert into THnSparse than gaus filled since uniform filled THnSparse gets a bigger sparse fraction of bins and memory? Dose that mean THnSparse prefer uniform index distribution?

SparseFractionBins SparseFractionMem Time Cost By Convert
UniformFill 0.62753 3.45188 11.901
GausFill 0.42932 2.36148 2193.757

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.