Memory usage by worker nodes

Akira · June 26, 2009, 6:28pm

Hello,

We have been observing that the proof workers accumulate a large amount of memory after running for a while. We have studied the memory leak in our code and we are quite sure we have sealed most of the leaks. What we see in proof workers is a quite dramatic increase in memory usage after running for half an hour or so, almost reaching to the limit of the single process memory usage and then keeps on running without reducing it.

Our observation is that this is not proportional to the number of events we process but rather the number of files we open. Is there any known issues regarding memory usage of PROOF? We have tried both PROOF lite and full PROOF installation and see the same behavior.

Cheers
Akira

krasznaa · June 27, 2009, 7:00am

Hi,

Just one addition: In my tests I concluded that it’s probably not even the number of files we open, but the number of separate queries that we run. At least I saw a much more rapid memory growth when I was processing single files per query than when processing >500.

I should also note, that we’re trying to run >1000 queries per PROOF connection. I guess if we set up the job to recreate the PROOF connection every once in a while then the problem could be “handled”, but it would be nice if the jobs could run with making the connection only once.

Cheers,
Attila

ganis · June 29, 2009, 11:52am

Hi,

We were not aware of memory issues of the type you describe. However, we do not have much data from >100 queries in the same session.
I will investigate.

In principle the workers should get cleaned after each query, while on masters some information about the past queries is kept.
Do you see any similar effect on the master?

Gerri

krasznaa · June 29, 2009, 11:58am

Hi Gerri,

We actually see this in two situations. (These are the only two cases in which we use PROOF at the moment.) PROOF-lite sessions and one-machine PROOF “servers”. So in the latter case a single computer is set up to act as a master, and run as many workers as the number of its processor cores. I’m not exactly sure if the master or the workers are leaking, but something is definitely collecting a lot of memory.

I’m actually looking at the issue right now, so I could even give you some example code/job that you could test.

Cheers,
Attila

ganis · June 29, 2009, 12:14pm

Hi Attila,

That would surely help a lot debugging. So, if you have something that I can test/run, please send it to me.

Cheers, Gerri

krasznaa · August 18, 2009, 12:54pm

Dear All,

I’d just like to share the “solution” with the list. Gerri let me know, that I should delete the query results from PROOF before every new query with something like this:

if( m_proof->GetQueryResults() ) { m_proof->GetQueryResults()->SetOwner( kTRUE ); m_proof->GetQueryResults()->Clear(); }

This solved the observed memory increase in 99%, which is good enough for us so far.

Cheers,
Attila