Windows <-> Linux with 5.27

Hi,
I always get a crash on the client side if I try to send back more than 6 objects total from my PROOF server.

I’m using 5.27d off the web site. I’ve got a build (gcc 4.1.2) on Linux which is running the “xproofd” server (no other setup), and the tar ball from the website downloaded on my W7 machine. I have a very simple root file which I run on locally and up on the proof server. I can run locally with the following TSelection file:

[code]///
/// SimpleSelection.cpp
///

#include “SimpleSelection.h”
#include <TH1.h>
#include

using std::endl;
using std::cout;

ClassImp(SimpleSelection);

#ifdef MAKECINT
#pragma link C++ class SimpleSelection+;
#endif

Bool_t SimpleSelection::Process(Long64_t entry)
{
cout << “Process…” << entry << endl;
return true;
}

void SimpleSelection::SlaveBegin(TTree *)
{
fOutput->Add(new TH1F(“hi”, “ther”, 10, 0.0, 10.0));
fOutput->Add(new TH1F(“hi1”, “ther”, 10, 0.0, 10.0));
fOutput->Add(new TH1F(“hi2”, “ther”, 10, 0.0, 10.0));
fOutput->Add(new TH1F(“hi3”, “ther”, 10, 0.0, 10.0));
// Uncomment this line and I get the crash
//fOutput->Add(new TH1F(“hi4”, “ther”, 10, 0.0, 10.0));
}

void SimpleSelection::Terminate()
{
cout << “See " << fOutput->GetEntries() << " items” << endl;
}
[/code] and here is the code I was using to actually run the tests:

[code]///
/// SimpleRun.C
///

void SimpleRun()
{
string fname = “output.root”;

//gSystem->CompileMacro("BTagNtupleBuilder/MuonInBJet.cpp");
//gSystem->CompileMacro("BTagNtupleBuilder/BTagJet.cpp");
gSystem->CompileMacro("SimpleSelection.cpp");


TFile *input = new TFile (fname.c_str(), "READ");
TTree *t = static_cast<TTree*> (input->Get("btag"));
SimpleSelection *s = new SimpleSelection();
t->Process(s);

///
/// Now, do PROOF
///

cout << "NOW Doing PROOF" << endl;
p = TProof::Open("xxx.phys.washington.edu");
//p->Load("BTagNtupleBuilder/MuonInBJet.cpp+");
//p->Load("BTagNtupleBuilder/BTagJet.cpp+");
p->Load("SimpleSelection.cpp+");

p->Process("user.Gordon.BTagComp.mc09_7TeV.105011.J2_pythia_jetjet.merge.NTUP_BTAG.e468_s766_s767_r1303_r1306_p245.v1", "SimpleSelection");

}
[/code]

When I run on the same machine as the PROOF server (but using the same TProof), it works just fine. So this is something about moving the data from Linux to W7.

Cheers, Gordon.

Hi,
Some more quick testing - this also happens when I drop the client down to 5.26 (leave the server up at 5.27). Here is what the terminal window looks like:

[code]Validating files: OK (6 files)
[TProof:] Total 1394273 events |>…| 1.15 % [8700.4 ev
[TProof:] Total 1394273 events |>…| 1.15 % [6834.7 ev
[TProof:] Total 1394273 events |>…| 1.15 % [5631.8 ev
[TProof:] Total 1394273 events |>…| 1.15 % [4789.0 ev
[TProof:] Total 1394273 events |>…| 1.15 % [4165.6 ev
[TProof:] Total 1394273 events |>…| 1.15 % [3685.8 ev
[TProof:] Total 1394273 events |>…| 1.15 % [3305.1 ev
[TProof:] Total 1394273 events |>…| 1.15 % [2995.7 ev
[TProof:] Total 1394273 events |==>…| 11.66 % [27840.1
[TProof:] Total 1394273 events |==>…| 11.66 % [25644.9
[TProof:] Total 1394273 events |==>…| 11.66 % [23728.9
[TProof:] Total 1394273 events |==>…| 11.66 % [22115.3
[TProof:] Total 1394273 events |====================| 100.00 % [185952.
7 evts/s]
Mst-0: merging output objects … done
Warning in TClass::TClass: no dictionary for class TOutputListSelectorDataMap
is available
Mst-0: grand total: sent 7 objects, size: 2508 bytes

C:\Users\gwatts\Documents\ATLAS\Projects\BasicNtupleBuilder>[/code]

If I lot into the server I can see the query repsonse root file and that looks as I would expect (it has the Proot response object, and if I do a GetOutputList() I get back a list that contains some PROOF stats stuff along with the expected TH1F’s).

And, running in 5.26, if I turn on gDebug=10, the end of the dump looks like the following (let me know if anyone wants a more complete dump):

[code]WriteBuffer, class:TNamed, name=fName, fType[1]=65, TStreamerString, bufpos=100,
arr=04169EC0, offset=12
WriteBuffer, class:TNamed, name=fTitle, fType[2]=65, TStreamerString, bufpos=115
, arr=04169EC0, offset=20
WriteBuffer for class: TNamed version 1 has written 59 bytes
ReadBuffer, class:TNamed, name=TObject, fType[0]=66, TStreamerBase, bufpos=90, a
rr=043366C8, offset=0
ReadBuffer, class:TNamed, name=fName, fType[1]=65, TStreamerString, bufpos=100,
arr=043366C8, offset=12
ReadBuffer, class:TNamed, name=fTitle, fType[2]=65, TStreamerString, bufpos=115,
arr=043366C8, offset=20
ReadBuffer for class: TNamed has read 59 bytes
TXSockPipe::DumpReadySock: client: list content: 5 | 040DE8E8 040DE8E8 040DE8E8
040DE8E8 040DE8E8
Info in TXSocketHandler::Notify: ready socket 040DE8E8 (0) (input socket: 0000
0000)
Info in TXSlave::HandleInput: 02A19EC8: 0: proof: 041A9C98, mon: 04076EC8
Info in TXSlave::HandleInput: 02A19EC8: 0: posting monitor 04076EC8
Info in TXSocket::PickUpReady: 040DE8E8: 0: going to sleep
Info in TXSocket::PickUpReady: 040DE8E8: 0: waken up
Info in TXSocket::PickUpReady: 040DE8E8: 0: got message (69 bytes)
TXSockPipe::Clean: client: 040DE8E8: pipe cleaned (pending 4)
Info in TXSocket::PushBackSpare: release buf 040703A0, sz: 136 (BuffMem: 32253
)
Mst-0: grand total: sent 7 objects, size: 2508 bytes
TXSockPipe::DumpReadySock: client: list content: 4 | 040DE8E8 040DE8E8 040DE8E8
040DE8E8
Info in TXSocketHandler::Notify: ready socket 040DE8E8 (0) (input socket: 0000
0000)
Info in TXSlave::HandleInput: 02A19EC8: 0: proof: 041A9C98, mon: 04076EC8
Info in TXSlave::HandleInput: 02A19EC8: 0: posting monitor 04076EC8
Info in TXSocket::PickUpReady: 040DE8E8: 0: going to sleep
Info in TXSocket::PickUpReady: 040DE8E8: 0: waken up
Info in TXSocket::PickUpReady: 040DE8E8: 0: got message (9 bytes)
TXSockPipe::Clean: client: 040DE8E8: pipe cleaned (pending 3)
Info in TXSocket::PushBackSpare: release buf 040703C0, sz: 43 (BuffMem: 32253)

TXSockPipe::DumpReadySock: client: list content: 3 | 040DE8E8 040DE8E8 040DE8E8
Info in TXSocketHandler::Notify: ready socket 040DE8E8 (0) (input socket: 0000
0000)
Info in TXSlave::HandleInput: 02A19EC8: 0: proof: 041A9C98, mon: 04076EC8
Info in TXSlave::HandleInput: 02A19EC8: 0: posting monitor 04076EC8
Info in TXSocket::PickUpReady: 040DE8E8: 0: going to sleep
Info in TXSocket::PickUpReady: 040DE8E8: 0: waken up
Info in TXSocket::PickUpReady: 040DE8E8: 0: got message (12 bytes)
TXSockPipe::Clean: client: 040DE8E8: pipe cleaned (pending 2)
Info in TXSocket::PushBackSpare: release buf 040703E0, sz: 43 (BuffMem: 32253)

Info in TXSocket::PickUpReady: 040DE8E8: 0: going to sleep
Info in TXSocket::PickUpReady: 040DE8E8: 0: waken up
Info in TXSocket::PickUpReady: 040DE8E8: 0: got message (17419 bytes)
TXSockPipe::Clean: client: 040DE8E8: pipe cleaned (pending 1)
Info in TXSocket::PushBackSpare: release buf 04070400, sz: 17419 (BuffMem: 322
53)[/code]

Happens from other windows machines, and two other things:

  • I can rename the SimpleSelection.cpp to be .cxx, and then I can upload it as a script. It runs just fine, if I remove the cout printout.
  • If I don’t remember that cout printout then PROOF seems to hang.

These were done with 5.26 on the client (prebuilt) and 5.27 on Linux as described above.

And a final bit of testing… I dropped the server down to 5.26d and now it works just fine. Even with the 5.27 client. So the bug exists somewhere in the server code, I suspect.

If someone wants further testing, or more info, or wants a bug report more formally done, etc., feel free to ask for more info. The various bits of software will probably remain on my machines for a wihle.

-Gordon.

Hi Gordon,

Could you give me access to your file(s), please? (then I can try to reproduce the problem)

Cheers, Bertrand.

cid-3ca7d6dd59e1d914.office.live … OOTBomb.7z should be it

Thanks. I’ll investigate and keep you in touch.

Cheers, Bertrand.

I think B found a “n-1” buffer overrun bug. At any rate, with that fix the client no longer crashes. This is a bug, I think, that is also present in 5.26, etc., and it was just luck that it wasn’t crashing on other machines… :slight_smile: [I have this luck…].