we set a PROOF Cluster at our research center almost one year and half ago, since then, this facility is being used very intensively. The first set-up used root5.26, since one month ago we use root5.28.
Lately we are experiencing serious problems with the merging of the output files. This process is currently very slow, that means that the total time at the end of the execution is larger than when PROOF is not used. Many times the execution is just frozen.
We found other threads at the forum related to this issue, but the problem there seemed to be the large output, that is not our case.
These were the conditions when one of our users experienced those problems the last time :
number of workers: among 8 - 15
number of readable input files: 12 with 8e6 events
size of the output file: ~ 50 KB (obtained running in sequential mode)
Just in case it will ring a bell. I noticed one particular case when merging becomes very slow: Running over a small number of events (~1K) in PROOF Lite. In this case a single slaves or at most two get all the events.
The work distribution for small number of events is probably not (always) well optimized. You say 1K events … do you know the number of files and their size?
The packetizer tries to match the packet size to the TTreeCache (default 30 MB) if it makes sense. In general this optimizes data fetching, but it may be that for small numbers of files and in PROOF-lite this needs to be reviewed.
I think you should not waste time with this particular use case unless it is a sympton of something else.
The sample is in a single 18 MB file with 83879 events in total. If I run over the whole sample it takes about 5.9 secs to initialize and 29 secs (we have seen higher rates some other times) to run a very simple analysis and the merging happens within a couple of seconds or so. Our samples are usually much bigger (~ several GB)
It is funny to see that when I decide to run over just a few K events (for a test), merging takes longer (10s of seconds). To add a bit more, if I run over 40Kevents merging takes about 10s. This is using PROOF Lite in a 8 cores machine with 4 GB of memory and nothing else demanding running on it.
I tried to set proof->SetLogLevel(2, TProofDebug::kOutput) as you suggested, but I don’t find the master log afterwards. Where should I look for it?