Histograms are memory monsters. Memory saving idea

ardashev · July 30, 2004, 8:31pm

Has anybody noticed that histograms take the same amount of memory irrespectively of whether they are empty or partially filled or fully filled. This is not right. One can definetely optimize them.

I am always creting histograms with set axis limits but still don’t fill them completely ( nowhere near completely) and when number of histograms goes up so does memory usage. I think only those bins that have something in them should be allocated.

Is any work in progress on that subject ?

brun · July 30, 2004, 9:16pm

How much memory do you expect to gain this way? a factor 2? Unlikely
that you will gain more!
Do you know about automatic binning (xmin >= xmax)?
My guess is that if you have many empty bins, your original limits are not good. If this is the case let the automatic binning do its job.
It will be anyhow a tradeoff between a potential minimum gain in memory and a possibly large loss in time.
Just my two cents.

Rene

ardashev · August 2, 2004, 5:02pm

As a relatively simple memory management improvement I could suggect enabling the same mechanism of flexible range irrespectively of whether user sets up limits or not. Memory allocation would be on
as needed basis ( already implemented) , but user will see the requested range.
One more integer variable would be needed to remember how to offset a smaller range against user-requsted one. That’s it. It could be made static inside one of functions so that no changes in header files are needed.
Perhaps a flag during compilation would allow switching between original and flexible memory allocations.

Anyone else wants histograms to occupy less memory ? ( that sis if anybody reads this except Rene)

some speculations about “does size matter”:

I agree that users can manually set histo limits as small as they can but in many cases one wants to have large range and only small peak on it. For these kind of histos memory optimization would help.

Right now, the Fill() function is already not as simple as a[i] = a[i] + add.
It has buffering, checks for bounds, etc… I could probably argue that filling arrays every event and creation of histograms and setting bins to array values at the end would probably be considerably faster for large number of histos. And, on the other hand, adding some memory optimization would not make it significantly slower than it is right now. Of course, all of this is only relevant for people who strain their PC memory so that paging file gets involved and then gain from memory optimization would really help.

I am having problems because I am running in parallel on many machines and some of them have fast CPU but not much memory. A smaller memory usage for histos would help here.