Histogram merging limitations

Dear all,

using proof with many histograms per job (order of 10K) i noticed that,
with a number of workers of about 100, the merging of histograms
uses too much memory and the process at some point fails.

I’m wondering why. I mean, the final ROOT file with those 10K histograms
is not that big (few MB) and i don’t think the compression factor is order of 1000
when loading in memory all the histograms contained in the file, so why when merging them
the master uses so much memory ?

Is it keeping in memory all the histograms that are to be added instead of deleting each
one after having added it ?

Many thanks in advance!

best,
Max

Hi,

Could you specify your ROOT version?
The handling of histogram merging was changed (and improved) in starting from 5-30-01 and 5-28-00f .

G. Ganis

Hi,

I’m using 5.30.01 (64bit).

best,
Max

Can you measure what is the size used in memory by your histograms?
You can compare the result of TSystem::GetProcInfo for a bare new ROOT session and after having loaded all your histograms.
This way we can estimate the memory consumption on PROOF and compare with what you observe.

G. Ganis

Do you know an easy way to load all histograms in memory reading from a file?

Max.

There is no direct call.
Please run the attached macro on your file: you should get something like

root [0] .L loadFileContent.C+
Info in <TUnixSystem::ACLiC>: creating shared library /home/ganis/local/root/trunk/trunk/./loadFileContent_C.so
root [1] TFile *f =  loadFileContent("SimpleFile.root")
Footprint of file content (32 objects loaded): 6532 kB (res), 11276 KB (virt)
root [2]

G. Ganis
loadFileContent.C (1.24 KB)

To read in memory all objects from a TFile *f, do f->ReadAll();

Rene

Dear all,

thanks for postings the macro, i tried it on an merged file with all the histograms i’m making.
The output is:

Footprint of file content (26152 objects loaded): 219708 kB (res), 219036 KB (virt)

As you can see the memory occupation is not that big but when the proof sever job is merging files
the memory usage grows a lot, here is a top-output during one of the merging:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

30048 mbellomo 39 19 2994m 717m 24m R 95.4 1.8 8:28.38 proofserv.exe

And sometimes, i get errors like this one:

HandleSocketInput: unknown command 1011! Protocol error?ding) ====>| 100.00 % [44596.1 evts/s, 277.4 MB/s]
HandleSocketInput: unknown command 1011! Protocol error?ding) ====>| 100.00 % [44282.3 evts/s, 275.5 MB/s]
HandleSocketInput: unknown command 1011! Protocol error?ding) ====>| 100.00 % [43978.8 evts/s, 273.6 MB/s]
HandleSocketInput: unknown command 1011! Protocol error?ding) ====>| 100.00 % [43595.0 evts/s, 271.2 MB/s]
[TProof:] Total 24720966 events |====================| 100.00 % [43585.5 evts/s, 271.2 MB/s]
HandleSocketInput: unknown command 1011! Protocol error?ding)
HandleSocketInput: unknown command 1011! Protocol error?ding)

I’m wondering if these errors are related to the memory usage, because they disappear as soon as i reduce the number of histograms per job
but also the number of workers. This second aspect make me thinking that ALL histograms from all workers are kept in memory until
the are summed together while i would have expected that as soon as the set of histograms from one worker is summed then this is also
removed from memory before loading the set from the next worker. Does this make any sense to you?

Many thanks in advance!

best,
Max

Dear experts,

can you please comment on this issue? This is actually preventing me to make full use of PROOF, i find
this as a big limitation of this nice framework. Maybe is just an issue of my way of using PROOF, this i don’t know. It would be nice to have some feedback from other people, is this a general issue?

And most importantly, how the merging of histograms works? Does it keep in memory all histograms
from all workers or only two sets: the summed one and the next to be summed at each iteration?

At the moment i’m using ROOT 5.28.00e (64bit). I’ve also tried with 5.30.06 and had the same issues.
Are there expected (big) improvements in 5.32 ?

Many thanks in advance!

best,
Max

Hi,

Starting from versions 5.28/00f and 5.30/01 there should be only max two copies in memory per histogram.
So you should see a difference moving from 5.28/00e to 5.30/06 . The fact that you do not see is already something to understand.
(For earlier version the merging was done in one-go because histogram merging one-by-one was very inefficient).

In the-last-but-one of your posts you mention:

Does it mean that you merge via files?
In such a case PROOF uses TFileMerger which by default merges histogram in one-go (I believe, I have to check).

You could also try to use submergers, which divide the merging job in parallel between a certain number of mergers:

   // Enable submerging; 0 means guess the optimal number
   proof->SetParameter("PROOF_UseMergers", 0)

This may help in your case.

G. Ganis

Hi,

thanks for the suggestions. i will first try again most recent versions to see if i see the expected improvement.
I would like to try also the merging option you suggested. Since i run in a framework where i don’t have
direct access to TProof instance, i’m wondering if there is any PROOF configuration file where i could write the
option. And where this file should be located, home dir ?

thanks!

best,
Max

To activate submerging you do not need any setting or file on the server side. Just do

 root [] TProof *proof = TProof::Open("master")
 root [] proof->SetParameter("PROOF_UseMergers", 0)

and the next proof->Process(…) will merge using submergers.

However, the worker machines requires inter-access, i.e. the ability to open network connections among each other. This you may need to check with your cluster administrator.

G. Ganis

Hi,

i tried with 5.28.00f and the merging seems to work better, it gets to the end but i still get errors like:

HandleSocketInput: unknown command 1011! Protocol error?ding)

The jobs has this stable memory occupation while merging is ongoing: 3.6gb virtual memory, 1.9gb resident memory. Are these errors something i should be worry about?

One comment on merging files, i said like this because i thought this was indeed what was happening under the wood. But i didn’t change anything in the proof parameters to activate this. So i should be using the default option.

Could it be a problem if the set of histograms from each worker is not the same? Like some histograms are created only in some worker-jobs since they depend on the files read by that worker.

best,
Max

Hi,

Sorry, I still do not understand if you are using files for merging or not. PROOF does not do that underground, by default it keeps objects in memory.
Merging via file is enabled by defining TProofOutputFile objects in your selector (see root.cern.ch/drupal/content/hand … root-files). File merging uses TFileMerger which by default does histogram merging in one go, with all histos in memory. If you are using this technique this may explain your problem.
I have changed today this default in the trunk and in the patch branches of 5.28, 5.30, 5.32, but you have to rebuild to get the change; or wait for the next tags on the branches.

Since your histos are in total about 300 MB it is worth to try in memory merging (no TProofOutputFile); and also the submergers.

G. Ganis

Hi,

yes, sorry i was not completely aware of how this was handled internally by the framework we use…i checked and indeed we use TProofOutputFile. Now, how can i tell to TFileMerger to use the step-by-step merging in memory? As, i guess, you set as default in the next tag.

best,
Max

Dear Both,

I have to post a clarification. SFrame (what Max uses) doesn’t use TProofOutputFile for everything. It uses this feature when the user declares that (s)he wants to produce an output ntuple in his/her analysis. Even in this case, SFrame only places the output TTree(s) of the analysis cycles into the local temporary files.

All other object types (histograms, graphs, whatnot) are kept in memory, just by adding them (in a fancy way) to the fOutputs variable of TSelector. The histograms and the output TTree(s) are only merged on the client machine at the very end. So for the histogram merging we should be able to use the full might of the merging code.

Thought that it would be important to point this out…

Cheers,
Attila

Hi Attila,

thanks for clarifying better this point on SFrame. Anyway now i’m a bit puzzled. If we are able to use
the merging in memory (so that only two copies of the same histo are kept in memory at a time) then i’m wondering why i still get the errors mentioned (which tend to go away if i reduce the number of histos/workers).
Are we sure that we don’t keep all histograms in memory while merging?

best,
Max

With the ROOT versions that I mentioned, yes. I have cross checked this explicitly yesterday with a large number of TH3F …
I agree that there is something weird going on here.

G

Hi Gerri,

Max asked me to elaborate a bit on what SFrame does exactly. The basic thing is done exactly as I wrote before. The histograms are just put into the fOutput list. But…

In order to be able to easily put histograms into sub-directories in the output file, I wrap them into SCycleOutput objects.

sframe.svn.sourceforge.net/viewv … iew=markup

This is for instance one place where I do this wrapping:

sframe.svn.sourceforge.net/viewv … iew=markup

(The top-most function.)

SCycleOutput is a tricky little thing, it can take care of the merging of the object that it wraps, and it can also write out the wrapped object in a nice way.

What I’m wondering about now is whether PROOF has some special treatment for histograms that it can’t use for all the other object types. I can imagine that since PROOF only sees instances of these SCycleOutput objects, it might not try to use some optimization.

Well, if nothing else, I hope that this is further input to the discussion.

Cheers,
Attila

Hi Attila, Gerardo,

thanks for the clarification. I’m then wondering, as you did, if the fact that we use the SCycleOutput makes a difference in the way histograms are merged. Gerardo, can you please comment on this? Thanks!

Attila, is there any way to test this issue in SFrame, like by passing the SCycleOutput thing ? I’m volunteering to code what needed and do the tests.

best,
Max