Merging very slow, even frozen

Hello all,

we set a PROOF Cluster at our research center almost one year and half ago, since then, this facility is being used very intensively. The first set-up used root5.26, since one month ago we use root5.28.

Lately we are experiencing serious problems with the merging of the output files. This process is currently very slow, that means that the total time at the end of the execution is larger than when PROOF is not used. Many times the execution is just frozen.

We found other threads at the forum related to this issue, but the problem there seemed to be the large output, that is not our case.

These were the conditions when one of our users experienced those problems the last time :

  • number of workers: among 8 - 15
  • number of readable input files: 12 with 8e6 events
  • size of the output file: ~ 50 KB (obtained running in sequential mode)

Could you help with this issue?

Best,
Ana.

Hi Ana,

Can you set

proof->SetLogLevel(2, TProofDebug::kOutput)

and post the master log for a job showing the problem?

Also could you describe which kind of objects the output is made of?

Cheers,
Gerri

Hi Gerri,

me and all the PROOF users at our institute have set the LogLevel option. Unfortunately (or fortunately) the problem is not showing up again.

I will let you know if we manage to get the log at any time.

Best,
Ana.

Hi Ana,

Ok, thanks.
Merging is a hot and difficult issue.
Any information/feedback will be very welcome.

Cheers,
Gerri

Hi Gerri,

Just in case it will ring a bell. I noticed one particular case when merging becomes very slow: Running over a small number of events (~1K) in PROOF Lite. In this case a single slaves or at most two get all the events.

Cheers,

Isidro

Hi Isidro,

The work distribution for small number of events is probably not (always) well optimized. You say 1K events … do you know the number of files and their size?
The packetizer tries to match the packet size to the TTreeCache (default 30 MB) if it makes sense. In general this optimizes data fetching, but it may be that for small numbers of files and in PROOF-lite this needs to be reviewed.

Cheers, Gerri

Hi Gerri,

I think you should not waste time with this particular use case unless it is a sympton of something else.

The sample is in a single 18 MB file with 83879 events in total. If I run over the whole sample it takes about 5.9 secs to initialize and 29 secs (we have seen higher rates some other times) to run a very simple analysis and the merging happens within a couple of seconds or so. Our samples are usually much bigger (~ several GB)

It is funny to see that when I decide to run over just a few K events (for a test), merging takes longer (10s of seconds). To add a bit more, if I run over 40Kevents merging takes about 10s. This is using PROOF Lite in a 8 cores machine with 4 GB of memory and nothing else demanding running on it.

I tried to set proof->SetLogLevel(2, TProofDebug::kOutput) as you suggested, but I don’t find the master log afterwards. Where should I look for it?

[fanae101] > ls ~/.proof/mnt_pool-fanae105-user-iglez-PROOF-ProofAnalysisFramework/last-lite-session/
session-fanae101.geol.uniovi.es-1301595915-23966.log     worker-0.4
worker-0.0                                               worker-0.4.env
worker-0.0.env                                           worker-0.4-fanae101.geol.uniovi.es-1301595916-23991.log
worker-0.0-fanae101.geol.uniovi.es-1301595916-23982.log  worker-0.4.log
worker-0.0.log                                           worker-0.4.rootrc
worker-0.0.rootrc                                        worker-0.5
worker-0.1                                               worker-0.5.env
worker-0.1.env                                           worker-0.5-fanae101.geol.uniovi.es-1301595916-23993.log
worker-0.1-fanae101.geol.uniovi.es-1301595916-23984.log  worker-0.5.log
worker-0.1.log                                           worker-0.5.rootrc
worker-0.1.rootrc                                        worker-0.6
worker-0.2                                               worker-0.6.env
worker-0.2.env                                           worker-0.6-fanae101.geol.uniovi.es-1301595916-23995.log
worker-0.2-fanae101.geol.uniovi.es-1301595916-23986.log  worker-0.6.log
worker-0.2.log                                           worker-0.6.rootrc
worker-0.2.rootrc                                        worker-0.7
worker-0.3                                               worker-0.7.env
worker-0.3.env                                           worker-0.7-fanae101.geol.uniovi.es-1301595916-23997.log
worker-0.3-fanae101.geol.uniovi.es-1301595916-23988.log  worker-0.7.log
worker-0.3.log                                           worker-0.7.rootrc
worker-0.3.rootrc

By the way, we are using ROOT 5.28.00a on SLC5.

Cheers,

Isidro