TChain not linked to Proof

meyerma · February 9, 2017, 1:01am

Hello guys !

I am facing an issue with my TChain that I created by doing a cast from TTree to TChain.
Proof is not running on it, event if I use TChain::SetProof();

Here is an example :

void test()
{
        TTree * t = new TTree("mlk","mlk");

        Int_t i = 0;
        t->Branch("i", &i, "i/I");
        t->Fill();
        i=1;
	t->Fill();
        i=2;
	t->Fill();


        TChain *ch = (TChain*) t;

        TProof *p = TProof::Open("");
        ch->SetProof();
        ch->MakeSelector("mlk");
        ch->Process("mlk.C+");
}

On the other hand, I tried to just load a root file using TChain::Add(filename); and I was able to get running Proof. Does someone knows how to process my TTree ? I think the conversion is not properly done, reason why I cannot run proof as I would like, is it correct ?

ganis · February 9, 2017, 8:44am

Dear meyerma,

PROOF was designed to process large datasets residing in files. The workers are separate processes which do not share the memory with the parent one. That’s why it does not provide an interface to process a TTree in memory. Also, TChain contains has additional information wrt TTree which is used by PROOF to perform dispatching and processing, therefore just casting to TChain will not work.
You need to save the TTree to a file and create a TChain or another dataset recognised by PROOF.
I agree that we could have provided a way to automatize this, but there was no real request and it would have worked only with PROOF-Lite.

If you are using v6.08 you can try the new TProcessExecutor (ROOT/TProcessExecutor.hxx). This does multi-processing with a fork model and provides an interface to process trees in memory. See, for example, tutorials/multicore/mp103_processSelector.C .

G Ganis

NB: if you are using the master, be aware that the TTree processing part of TProcessExecutor has been split out to ROOT/TTreeProcessorMP.hxx .

meyerma · February 9, 2017, 9:15am

Thank you very much Ganis.

Maybe I can change my question in this case.
I have a large set of data that I first process. Then I get around 240k histograms as PROOF output based on a small sample of the total amount of data to process.
My problem was then I have to wait TSelector::Terminate() to have all data of histograms merged. Second I have to renormalize (the renormalization can only be performed at the end… when all data run by run are here) and generate a 2D histogram from those 1D.

That why I thought to create a TTree put my histogram 1D and renormalize in a second proof execution.
Does it make sense to save it into a file? Maybe there is a better way ?

ganis · February 9, 2017, 10:59am

Yes, I would save the 240k 1D histos to a file for a second pass. Depending on how big are the histos and how much memory the take, I would also use merge-by-file in PROOF (see root.cern.ch/handling-outputs#clientside).

G Ganis

meyerma · February 22, 2017, 3:44pm

Hi Ganis,

Thanks a lot for your input. Indeed I got some memory trouble due to too large histograms.
I used the “stf” option and it works much better

I just have one question and I think it’s related to the proof-on-demand server that I use at lxplus. I am getting this kind of error messages:

[quote]TProofOutputFile::AddFile: error from TFileMerger::AddFile(rootd://meyerma@p06109780s36914.cern.ch … 14.q1.root)
TProofOutputFile::AddFile: error from TFileMerger::AddFile(rootd://meyerma@p06109780s36914.cern.ch … 14.q1.root)
TProofOutputFile::AddFile: error from TFileMerger::AddFile(rootd://meyerma@p06109780s36914.cern.ch … 14.q1.root)
[[/quote]

It looks like the xrootd protocol is not available when I am working with LSF. I tried to add the option :
xpd.rootd allow in the xpd.cf file located here : ~/.PoD/etc/xpd.cf
But in the end no improvement…

Do you have an idea ?

ganis · February 22, 2017, 5:32pm

Dear meyerma,

Which version of ROOT are you using?
Can you check if .PoD/etc/xpd.cf includes your addition at the end?
Can you post (compressed, if big) the file .PoD/log/PodServer/xpd.log ?

G Ganis

meyerma · February 23, 2017, 9:55am

Hi Ganis,

I am using ROOT 5.34.36.
It includes indeed my modification at the end, yes.
But it seems I am not allowed to send back from the batch computers, according to error messages.

And here are the files :
xpd-log.txt (5.9 KB)
xpd-cf.txt (1.8 KB)

ganis · February 24, 2017, 8:47am

Hi,

What happens if, once started the PoD cluster, you do

gSystem->AccessPathName(“rootd://meyerma@p06109780s36914.cern.ch//tmp”)

?

(Replace p06109780s36914 with the hostname of one of the machines you got; you should be able to find that out using gProof->Print(“a”)).

G Ganis

meyerma · February 24, 2017, 8:56am

Hi Ganis,

I started an execution running a slave:

JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
855424087 meyerma RUN   1nh        lxplus104.cern.ch p06108707n83665 17053[1]   Feb 24 09:53

Here is the result:

root [3] gSystem->AccessPathName("rootd://meyerma@p06108707n83665.cern.ch//tmp")
SysError in <TUnixSystem::UnixTcpConnect>: connect (p06108707n83665.cern.ch:1094) (Connection refused)
Error in <TFTP::TFTP>: can't open connection to rootd on host p06108707n83665.cern.ch at port 1094
(Bool_t)1

That’s quite interesting to see that rootd:// is using tftp. Am I correct ?

meyerma · February 24, 2017, 8:59am

I just notice that if I remove the cern.ch it seems to work.
What do you think ?

root [7] gSystem->AccessPathName("rootd://meyerma@p06108707n83665//tmp")
(Bool_t)1
root [8] gSystem->AccessPathName("rootd://meyerma@dummy_host//tmp")
Error in <TFTP::TFTP>: can't open connection to rootd on host dummy_host at port 1094
(Bool_t)1

ganis · February 24, 2017, 11:42am

Hi,

I think I understand, it is problem with the port. If you do gProof->Print(“a”) it should show connection ports of 22001 or similar.

Could you try by adding the following to your .PoD/user_xpd.cf0 (this is the right way to add something to the xpd.cf):

### Set the local data server for the temporary output files accordingly
xpd.putenv LOCALDATASERVER=rootd://<host>:<port>

G Ganis

ganis · February 24, 2017, 12:07pm

Forgot to mention that you should add that line as written , <host> and <port> will be replaced automatically with the good ones on each worker.

G Ganis

meyerma · February 24, 2017, 1:24pm

I updated my user_xpd.cf0 and restart the pod server.
Checking the log, your modification has been taken into account.

Nevertheless the problem still occurs.

*** Worker 0.0  (valid)
    Host name:               p06109780u59045.cern.ch
    Port number:             21001
    Worker session tag:      
    ROOT version|rev|tag:    5.34/36|r49361|5.34/36
    Architecture-Compiler:   linuxx8664gcc-gcc493
    User/Group:              meyerma/default
    Proofd protocol version: 36
    Image name:              p06109780u59045.cern.ch:/pool/lsf/meyerma/855509218.1/PoDWorker_AiohSb6ONK/proof/meyerma
    Working directory:       /pool/lsf/meyerma/855509218.1/PoDWorker_AiohSb6ONK/proof/meyerma/session-lxplus090-1487942562-26155/worker-0.0-p06109780u59045-1487942563-29161
    Performance index:       100
    MB's processed:          0.00
    MB's sent:               0.00
    MB's received:           0.00
    Real time used (s):      0.000
    CPU time used (s):       0.000

It just seems the connection is denied

root [0] 
Attaching file rootd://p06109780u59045.cern.ch:21001//pool/lsf/meyerma/855509218.1/PoDWorker_AiohSb6ONK/proof/meyerma/data/0.0/p06109780u59045-1487942261-26272//output-lxplus090-1487942261-14407.q1.root as _file0...
SysError in <TUnixSystem::UnixRecv>: recv (Connection reset by peer)
Error in <TNetFile::TNetFile>: can't open connection to rootd on host p06109780u59045.cern.ch at port 21001
Error in <TNetFile::Create>: server does not accept connection from this host: contact server administrator
Error in <TNetFile::Create>: failing on file rootd://p06109780u59045.cern.ch:21001//pool/lsf/meyerma/855509218.1/PoDWorker_AiohSb6ONK/proof/meyerma/data/0.0/p06109780u59045-1487942261-26272/output-lxplus090-1487942261-14407.q1.root

ganis · February 24, 2017, 2:07pm

Yes, there is still a problem, but it is different from the previous ones.
Could you post the xpd.log of one worker machine? They should be available somewhere under .PoD/log …

In the meantime I will try to reproduce …

G Ganis

meyerma · February 24, 2017, 3:04pm

Of course, here it is

I started the Pod-Server, requested for 2 workers and then I ran my proof execution over “pod://”
The procedure ended at 15:57:54.

xpd-log.txt (8.9 KB)

ganis · February 24, 2017, 4:12pm

Hi,

I understood the problem: there is an issue with the latests v5.34 (starting from 5.34.20): these were built with cmake and an essential part for this file retrieval mechanism was not included in the distribution.
Is there any chance for you to move to v6 ?
There is no workaround other then providing the executable.

G Ganis

meyerma · February 26, 2017, 5:34pm

Thank you very much Ganis for the support.
I would definitely not find it by myself !

It might be possible for me to switch to v6. I will check that later.
Cheers,
Marco