Controlling PROOF's output messages

Hi,

So after a break I returned to trying to use PROOF, with ROOT 5.21/06. (I got interested again because of PROOF-Lite.) I could get my old code running pretty easily, but now I’d like to add some more functionality, so I think I’ll have a few questions in the next few weeks…

Right now I’m trying to get my analysis code to print the messages produced on the individual worker nodes to the screen after the event processing had finished. I think I managed to do this the “correct way”, by looping over the TProofLogElem list returned by TProofMgr.

But I see a quite large number of info messages from PROOF itself reporting about the memory consumption of my job, all in the format of the following line:

These messages make it very difficult to see the output from my analysis code. So I tried turning them off, or at least reducing their frequency. Unfortunately the online documentation didn’t get me too far, so I started browsing the PROOF source code. I think I managed to find the source of these messages in the TProofPlayer::Process(…) function. I see in that function that it queries a parameter, “PROOF_MemLogFreq”, to decide how often it should print these lines.

So I tried setting this parameter to a number of values using TProof::SetParameter(…), but the result is always the same. I always see the same number of messages on the worker nodes. I even tried setting the parameter “manually” using something like:

to no avail…

Can someone help me reducing the number of these messages?

Cheers,
Attila

Hi Again,

In the meantime I started doing something else that’s also relevant here I guess. Since my analysis framework uses its own message logging facility, I thought I would redirect all ROOT messages to my own logger.

This seems to work in principle. I defined a global function of type ErrorHandlerFunc_t, and gave it to SetErrorHandler(…). (Of course all of these are defined in TError.h.) After these changes almost all the messages are channeled through my message logger, which makes the output of my analysis code a bit more consistent in my mind. But a few things are still printed using the DefaultErrorHandler function:

  1. The PROOF code prints messages directly to stdout in many places as far as I can see. Some of these have good reason of doing this, but some not. For instance the messages printed when I use TProof::Open(…), could just as well be printed using TObject::Info(…). Could the developers comment on whether they would consider using the standard ROOT logging facilities for these messages?

  2. When I print all the log lines from all the nodes, I see a lot of such lines on the master:

Even though I call SetErrorHandler(…) in various places in my derived TSelector class (such that the error handler would be updated for the worker nodes as well), these lines are always printed using DefaultErrorHandler.

  1. My input ntuples contain objects which are unknown to ROOT. It doesn’t really matter, since I don’t read these. But when such a file is opened on a worker node, I get this message:

Unfortunately the file is opened before my TSelector object would be created, so this first warning always goes through the default error handler. So here I understand “the problem”, I would just be interested if any of you guys would know a way of putting these messages through my message logger as well.

If you have any comments/suggestions, I’d be interested to hear it.

Cheers,
Attila

Hi Attila,

  1. Memory messages

Currently setting

p->SetParameter("PROOF_MemLogFreq", 0)

(or to a negative number) switches off the messages on the workers, but it does not affect the messages from the master. I will add a global flag to switch those messages off as a whole.
Note that you should be able to grep off those messages from your log passing “-v Svc” in the Grep box of the log window or using

 pl = TProof::Mgr("localhost")->GetSessionLogs(0,0,"-v Svc") 

on the command line.

  1. ErrorHandler issues

2.1 Messages in TProof::Open or alike
The problem here is that the counter messages are need a ‘\r’ instead of ‘\n’ and currently the default error handler always add a ‘\n’. So for some messages we could switch to Printf(…), but not for all of them; and this, I guess, will not help you much. We will discuss it internally and see what we can do.

2.2 Object merging memory message in PROOFLite
In PROOFLite we need to redirect those messages to the local log file and for that we change the ErrorHandler just before printing these messages. This makes your setting
ineffective. We will add a flag to control that.

2.3 Warnings from TClass
The best would be to set your error handler once for all at the beginning of the session, instead of doing it in the selector. I will try to figure out how to do that.

Cheers, Gerri

Hi Gerri,

Thanks for the answer! However your first comments are not in sync with what I observed. I tried setting this parameter using

(because I was not sure which overloaded function it would call by not specifying the exact type of the number…), but it didn’t change anything in my results. I always saw the very same number of messages.

Besides, I actually like the idea of these messages. I could imagine having about 10 such messages per job on the worker nodes. I just have a problem with seeing so many of them.

Well, I’ll try again today. Cheers,

        Attila

Hi Gerri,

Setting this parameter still doesn’t change much for me, but I gave up on it for the time being. (Although I should note that I also got the same messages when running on a “normal” PROOF server, not just when running PROOF-lite.)

Now I would just like to ask some other questions in the message logging area:

When I enable a package on the PROOF node(s) I see the compilation messages in real-time in my job. How are these messages printed? I couldn’t find the code piece with a quick read-through that is responsible for showing these messages. But in any case, I think the responsible function could just as well print them using Info(…) instead of using Printf(…). By the way, where is Printf(…) (with a capital P) defined?

I always get a number of

output lines when running a job. What do these mean, and where in the code are they coming from? They’re not really harmful, but I don’t know what they’re good for…

Cheers,
Attila

Hi Attila,

  1. Mem log messages
    The parameter for the mem log frequency is a Long64_t, as it refers to a number of events; unfortunately for me doing this
p->SetParameter( "PROOF_MemLogFreq", (Long64_t)0)

works both for standard PROOF and PROOF-Lite.
I am using the trunk, which however should behave as 5.21/06 with respect to this. I will try now with 5.21/06 and let you know.

Having said this, we have decided to filter out by default these messages when you get the logs. I hope to be able to implement this for 5.22.

  1. Logs from package building
    Realtime logging is done via an event handler: as soon as something is written to the logfile this is sent over to the step up (worker to master, master to client).
    The class is TProofServLogHandler defined in TProofServ.h and TProfoServ.cxx . The Notify() method is the one that sends off the messages.

  2. Logs control
    In ROOT there is the possibility to redirect stderr and stdout to a file with TSystem:RedirectOutput .
    You can have a look at test/stressProof.cxx for an example of usage.
    Could this be useful in your case?

Otherwise we can implement something that detects special ending characters in Printf(…), Info(…), … and skip the addition of ‘\n’ in such cases. Then every non-standard error handler should do that …

Btw, Printf(…) is defined in core/base/src/TString.cxx .

Cheers, Gerri

Hi Attila,

I have tried the memlog frequency control on 5.21/06 and it seems to work both for standard and lite.
So, I am a bit puzzled by your findings.

Going step by step, could you confirm that if you do the following in a ROOT shell:

root [0] p = TProof::Open("")
 +++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)
(class TProof*)0xa615a0
root [1] p->SetParameter( "PROOF_MemLogFreq", ( Long64_t ) 0 );
root [2] p->Process("tutorials/proof/ProofSimple.C+",100000)

Info in <TProofLite::SetQueryRunning>: starting query: 1
Info in <TUnixSystem::ACLiC>: creating shared library /home/ganis/local/root/5-21-06/root/./tutorials/proof/ProofSimple_C.so
(Long64_t)0
root [3]

you still get the messages on the workers?

Cheers, Gerri

Hi Gerri,

Well, I get the following:

[quote]root [0] p = TProof::Open( “” );
+++ Starting PROOF-Lite with 2 workers +++
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)
root [1] p->Process( “/usr/local/root/tutorials/proof/ProofSimple.C+”, 100000 );

Info in TProofLite::SetQueryRunning: starting query: 1
Warning in : /usr/local/root/tutorials/proof is not writeable!
Warning in : Output will be written to /var/folders/PP/PP90N35oG00YWC5Jys3ckk+++TI/-Tmp-/attila
Info in TUnixSystem::ACLiC: creating shared library /var/folders/PP/PP90N35oG00YWC5Jys3ckk+++TI/-Tmp-/attila//usr/local/root/tutorials/proof/ProofSimple_C.so
root [2][/quote]

(I have 5.21/06 under /usr/local/root.)

So as you can see, even without touching the PROOF_MemLogFreq property I see no output on the ROOT prompt. I see the memory messages when I retrieve the logs using TProofMgr::GetSessionLogs() and print their contents. I could send you the exact piece of code that does this for me, but setting PROOF_MemLogFreq in any shape or form doesn’t influence the log messages I get using GetSessionLogs()

Cheers,
Attila

Ohh, okay…

I have to change a bit what I wrote. So I’m playing with the following code now:

[code]void proof_test() {

p = TProof::Open( “” );
p->SetParameter( “PROOF_MemLogFreq”, ( Long64_t ) 10000 );

p->Process( “/usr/local/root/tutorials/proof/ProofSimple.C+”, 100000 );

logList = p->GetManager()->GetSessionLogs()->GetListOfLogs();
logElem = ( TProofLogElem* ) logList->At( 1 );
logElem->GetMacro()->Print();

return;

}[/code]

This seems to behave as I would expect actually. Changing the parameter in the second line seems to have the expected effect on the number of lines printed by my first worker node. I’ll have to see if I can get my full-blown code to behave this way…

Cheers,
Attila

Hi Gerri,

Oh, boy… I finally found the culprit… I was actually responsible for the problem…

Turns out that my code had an m_proof->ClearInput() call at the beginning of the job submission. (But after this property has already been set.) Now that I reorganised the calls a bit, I seem to be able to control the memory messages from my compiled application as well.

Thanks for bearing with me…

Cheers,
Attila

Hi Attila,

Ok, good that you found it.

Btw, it would be nice if you could work with the trunk, because I am fixing/improving things there and you could give me immediate feedback.

Cheers, Gerri