Memory-conservative histograms for usage in DQ monitoring

kittel · April 27, 2009, 10:11am

Hi ROOT-ers

I am writing you because I am currently tailoring a solution for modifying
our usage of ROOT histograms for data-quality monitoring in ATLAS
reconstruction jobs, without paying the memory penalty which we currently
pay (as you are probably aware we are fighting a battle against our present
~2GB mem-usage). So I would like to hear if anyone else has experience with
this sort of thing (or perhaps would have an interest in our solution).

The problem we face is that we are booking and filling a rather large number
of histograms in each reco-job (~5000), totalling more than 100MB of
mem-usage (meaning that often people turn off the monitoring just so the
grid machinery wont kill their jobs). However, once written out and
compressed in a .root file, the usage is much less, typically just a few
megabytes ~5MB, indicating what one can also see by looking at the
histograms themselves: Often most bins in a given histogram are unused, or
they are used in a way which could be described by just a few bits rather
than 4 or 8 bytes (i.e. after calling ::Fill(x) on a given bin less than 256
times, one could describe that bin with just an uchar rather than e.g. the
full float/double/int in TH1F/TH1D/TH1I). We have thus written a simple
histogram replacement taking some of these ideas into account, wasting a bit
of cpu in favour of having a much tighter memory footprint in our jobs
(freeing up hopefully 80-90% of the memory used by histograms).

However, since we have a very large number of histogram-using monitoring
packages and a large number of scripts/programs set up to process these root
histograms (often using more advanced features of root histograms such as
fitting or adding names to bins, etc.), we can not just migrate to something
else. So the solution we are pursuing now is to provide these light-weight
(mem-wise) histogram classes with names and method names similar to those of
root (e.g. TH1F_LW vs. TH1F), and use those for histogram filling during our
job. Only at the post-processing stage are they converted into their root
equivalents and written out.

Sorry for the somewhat lengthy and slightly convoluted explanations

Looking forward to hearing your comments.

Cheers,
Thomas

ps. I looked into THnSparse, but that seems not to be appropriate for our
situation since it is first of all not similar to the usual TH1F, etc (the
migration would be too difficult)., and since the optimisation used
internally doesn’t scale well to the situation where most bins are actually
starting to be filled (e.g. in a 12 hour long monitoring job). A similar comment
goes for TH1C/TH2C since users can’t know the maximal bin content in
advance (also, TH1C/TH2C books even empty bins).

Axel · April 27, 2009, 4:45pm

Hi,

why don’t you simply use a disk-resident TTree / TNtuple with 5000 leaves? It’s trivial to convert those to histograms, too And the memory usage will be far lower.

Cheers, Axel.

kittel · May 6, 2009, 12:25pm

Hi Axel,

Thanks for your input :=)

While I in general think it is a neat idea to occasionally flush select parts of memory to disk (presumably to use the application-specific knowledge to do a better job than the OS-provided automatic swapping), I don’t really think your idea would be the appropriate solution in this case. First of all is the fact that it is is not realistic for us to migrate our usage to such an implementation (we have many different packages, used dynamically in many different ways and written by many different people). Second of all is the concern of what happens when you have a large unbounded, i.e. growing with number of events, on-disk footprint. Speed issues aside, some grid nodes actually only have memory-resident disk-space available for the jobs… and i heard from at least one grid-admin that we are already pushing it a bit.

Cheers,
Thomas

brun · May 7, 2009, 9:26am

Christian,

I am not sure to understand your problem. 5000 histograms is nothing. The overhead of a TH1 is about 600 bytes + the bin contents, ie the max overhead that you can think of is 5000*600 bytes = 3 MBytes, very far forom your 100 MB.
Why
-don’t you use TH1C
-reduce the number of bins
-use automatic binning to reduce the number of emty bins (and also total number of bins) ?

Rene