Tracing Long PROOF Pauses to TTreeCache

afarbin · November 24, 2010, 4:03pm

Hi,

I have a small cluster of Mac Pro’s with disks aggregated via XRD. Running PROOF lite (root version 5.26d) on just one machine (12 core/24 thread) I noticed that when processing (mostly via TDSet::Draw command) large samples (10’s of GB), I get long pauses in event processing with no CPU load. I thought it was a problem with XRD, so I moved the data to one machine and am now reading directly, but the problem persists.

A more careful showed that the machine is doing lots of disk read and write during these pauses, which I tracked down to the proofserv.exe instances growing past my 26 GB of memory and getting swapped out. Note that it’s not a memory leak because the memory footprint decreases once I move to processing a smaller sample.

Suspecting TTreeCache (though not understanding why, because the advertised default size is small enough to fit in memory), I reduced the TTreeCache size… didn’t help. Then I disabled TTreeCache… the problem disappeared and my processing rate is very significantly improved.

Seems like something is going wrong with TTreeCache. I’m not sure how to dig further… but it would be nice to benefit from TTreeCache without the memory footprint. Any suggestions?

Other issues:

It would be really nice if TDSet also had a Project method in addition to Draw…
Is there a way in ROOT to stop a new canvas from taking my focus? I can’t force the Mac’s X11 client to stop the change of focus.

Thanks,
Amir

afarbin · November 24, 2010, 4:20pm

Hi Again,

This is a bit embarrassing, but trying different samples, I see that turning off TTreeCache didn’t really help… I’m still running out of memory when I really shouldn’t. So I guess my question now is why do the proofserv.exe have to grow so big in memory (close to 2 GB each) and is there a way to constraint them?

Thanks,
Amir

pcanal · December 7, 2010, 9:03pm

Hi,

Did you try to run the same analysis either via ProofLite or simply in a single process (without Proof)? It is plausible that there is a memory leak in the handling of your objects.

Philippe.