PROOF memory

iglez · November 29, 2016, 2:34pm

Hi,
In our PROOF Analysis Framework (PAF) we are experiencing a problem with memory increasing for every new sample (a collection of files we process) in a row. So:

Is there any significant memory that is allocated (or not released) at the beginning (or at the end) of a call to TPOOF::Process()? Maybe related to the opening (or closing) of the files?
Which is the recommended way to monitor the memory?

Thanks a lot,

Isidro

ganis · November 29, 2016, 5:13pm

Hi Isidro,

[quote=“iglez”]1) Is there any significant memory that is allocated (or not released) at the beginning (or at the end) of a call to TPOOF::Process()? Maybe related to the opening (or closing) of the files?
[/quote]
In principle not, I made once a careful debugging of this. I’ll cross-check and let you know.
Could you specify the ROOT version and the amount of leak that you observe?

The proof processes measure regularly their memory usage and write into the log file. If the problem is reproducible in simple setups I try to use valgrind for memory checking.

Gerri

iglez · November 29, 2016, 6:09pm

I tested with the latest ROOT 6.08.00. If it is a quick thing to do, it will be great, otherwise I keep looking for sources of memory leaks in our framework. What is confusing me is the pattern with a total increase of almost GB in PROOF LIte per new sample (8 cores used) that happens exactly at the beginning for a realistic supersimetric analysis on flat TTrees with ~700 variables per event. The memory during the sample processing is stable.

Can you remind me of the best way to get and process the log file? I would also consider the possibility of writing the memory at several steps that happen in the slaves. I guess the best way is using TSystem::GetProcInfo(), right?

Cheers,

Isidro

ganis · November 30, 2016, 3:34pm

Hi Isidro,

I have checked with 6.08 with a TProofBench run (so many proof queries/process calls) on a 24 machine and I do not see any long term increase.

[quote=“iglez”]Can you remind me of the best way to get and process the log file? I would also consider the possibility of writing the memory at several steps that happen in the slaves. I guess the best way is using TSystem::GetProcInfo(), right?
[/quote]
You can get all log files with TProofMgr::GetSessionLogs. For example, this is what we have in test/stressProof.cxx to collect the workers logs in one single file:

      TString logfiles(glogfile);
      // Save also the logs from the workers
      TProofMgr *mgr = gProof ? gProof->GetManager() : 0;
      if (mgr) {
         gSystem->RedirectOutput(glogfile, "a", &gRH);
         TProofLog *pl = mgr->GetSessionLogs();
         if (pl) {
            logfiles += ".nodes";
            pl->Retrieve("*",  TProofLog::kAll, logfiles);
            gSystem->RedirectOutput(0, 0, &gRH);
            SafeDelete(pl);
         } else {
            gSystem->RedirectOutput(0, 0, &gRH);
            printf("+++ Warning: could not get the session logs\n");
         }
      } else {
         printf("+++ Warning: could not attach to manager to get the session logs\n");
      }

or

pl->Retrieve("*",  TProofLog::kGrep, logfiles, "CheckMemUsage");

to grep only the lines with memory information.
By default these are dumped only at the beginning and the end of the query, but you can control the frequency of checks with the PROFO parameter “PROOF_MemLogFreq” or the env PROOF_MEMLOGFREQ .
If you set one of those to 1, for example, memory information is written after processing every event.
Setting to 0 it calculates the frequency in such a way to have about 100 dumps during the run.

And yes, the information is obtained with GetProcInfo . See TProofPlayer::GetMemUsage .

Cheers,
Gerri