I needed a class for a histogram with 5 dimensions. The purpose was to fill the histogram with data and then use it to generate random vectors with pdf corresponding to histogram content. As ROOT cannot handle histograms with more than 3 dimensions I wrote a class for myself. Capabilities are
- represents N dimensional histogram
- Filling with weights
- Generating random vectors
- Input and output from/to a ROOT file
If you find it useful, you can find source code + makefile + an example in the attachment. It works on Linux, never tried on Windows.
THn.tar.gz (5.56 KB)
It is interesting to see your post at a time where we are designing a class to store sparse multi dimensional data (we call it THnSparse :
The main difference between your design and ours is that we want to store
only the bins with some contents. This makes a HUGE difference in memory requirements for high dimensions.
Great! Your approach is much better, memory consumption is an issue with the simple approach I’m currently using.
When is the class coming?
Should be there in a few weeks. We are busy right now with CHEP and the CERn School of Computing.
The memory consumption is really limiting me now. There’s no point in converting my class to the sparse approach, when you’ve already made some effort in this direction. Hence, might I help you to finalize your class (so as I could use it asap)?
THnSparse class will go to CVS sometime this week or early next week (thanks to Axel)
Great, today I found it at CVS. However, the current implementation lacks ability to generate random vectors according to the histogram contents. Therefore, I’ve implemented this feature. Modified source and header is in attachment. The new methods are very similar to those in TH1, TH2, etc. The only difference is that GetRandom method takes two parameters. The second is bool and when it is set true, the output random vector is randomized bellow one bin level. I.e. instead of returning bin centers, it produces positions uniformly distributed over bin widths.
I’ve modified one more thing. When TH1 is filled, it increments fEntries by 1 and fTsumw by weight of the filled hit. The THnSparse incremented fEntries by the weight. This was a bit confusing and hence I modified the source so as it behave as TH1. Besides GetEntries, there is also GetWeightSum method. This enables to slightly improve Projection methods - to set number of entries in the projected TH1 or TH2 etc.
I hope you find these modification useful.
Moreover, my compilator gives two warnings (and the second one might be of importance)
hist/src/THnSparse.cxx: In member function `Long_t THnSparse::GetBinIndexForCurrentBin(Bool_t)':
hist/src/THnSparse.cxx:470: warning: comparison between signed and unsigned integer expressions
hist/src/THnSparse.cxx: In member function `THnSparse* THnSparse::Projection(UInt_t, UInt_t*) const':
hist/src/THnSparse.cxx:689: warning: comparison of unsigned expression >= 0 is always true
THnSparse.h (6.54 KB)
THnSparse.cxx (29.1 KB)
I have included your new function GetRandom in CVS, also fixing a few more things. Thanks for this contribution.
Note that we are going to modify several APIs in this class. Consider it very unstable for the time being.
Tomorrow, we plan to introduce two tutorials developed by Axel
-one testing the performance (space and time-wise) compared to a naive version where all bins are stored in memory
-one showing how to display a THnSparse using the new class TParallelCoord.
I’ve realized that my GetRandom method works in a way which is not desired - it also generates vectors corresponding to under/overflow bins. To remove this issue, I choose a simple approach. When ComputeIntegral method is called, it checks for each bin whether the bin exceeds limits in any dimension and if so, its weight is counted as zero. Modified method is in the attachment.
THnSparse.cxx (32 KB)
Your version with some minor mods is now in CVS.