Error sending kROOTD_GET command

Hi,
if I launch the following commands

ss.Draw(">>list", ("tw_cpre&&"+pigg_eff_cut).c_str()) TEventList *list = (TEventList*)gDirectory->Get("list") list->Print() ss.SetEventList(list) ss.Draw("mgg")

in the interactive ROOT shell everything is fine, but if I put that list in a macro I get

[code]Error in TCastorFile::GetBuffers: error receiving buffer of length 262144, got 0
Error in TCastorFile::ReadBuffer: error receiving buffer of length 31998, got 0
Error in TBranch::GetBasket: File: castor:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/1.root at byte:0, branch:pi_p, e
ntry:3938, badread=0
Error in TCastorFile::ReadBuffer: error receiving buffer of length 31999, got 0
Error in TCastorFile::ReadBuffer: error sending kROOTD_GET command
Error: Symbol G__exception is not defined in current scope /afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C:330:
Error: type G__exception not defined FILE:/afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C LINE:330

*** Break *** segmentation violation
terminate called after throwing an instance of 'std::bad_alloc’
what(): St9bad_alloc
file probably overwritten: stopping reporting error messages[/code]

both if I use it in the interactive ROOT shell or in a bash shell (root -l -b -q ‘macro.C(…)’).

I experienced these problem only on CASTOR2 after setting:

setenv RFIO_USE_CASTOR_V2 YES setenv STAGE_HOST castorpublic setenv STAGE_SVCCLASS na48

What happens?

Simone

Hi Simone,

I could correctly open and read the file, so perhaps it is realated to the way the file it is used in your macro.
Can you post the macro (eff.C or a reduced version of it) reproducing the problem?

Also, which ROOT version are you using?

G. Ganis

I’m using ROOT 5.14 from /afs/cern.ch/sw/lcg/external/root/5.14.00/slc4_ia32_gcc34/root

This is a reduced version of the eff.C macro:

void eff(string run, string ssn, string var, Int_t nbin, Double_t vmin, Double_t vmax) {

string list_eff_cut = “tw_cpre&&pi_p>10.&&g_emin>5.&&pi_rdch1>15.&&pi_rdch1<90.&&pk_dist>10.&&cls_dist>40.&&min(g0_pi_dist,g1_pi_dist)>20.&&vtx_z>-1000.&&vtx_z<8000.”;

TH1D *hnum = new TH1D(“hnum”, “hnum”, nbin, vmin, vmax);
TH1D *hden = new TH1D(“hden”, “hden”, nbin, vmin, vmax);
TH1D *heff = new TH1D(“heff”, “#epsilon_{L1}”, nbin, vmin, vmax);
load_ss(“pigg”, run, ssn);
ss.Draw(">>list", list_cut.c_str());
TEventList list = (TEventList)gDirectory->Get(“list”);
ss.SetEventList(list);
string num_plot = var + “>>hnum”;
string den_plot = var + “>>hden”;
string num_cut = “pu_l1b0&&pu_strobe”;
ss.Draw(num_plot.c_str(), num_cut.c_str());
ss.Draw(den_plot.c_str());
hnum->Sumw2();
hden->Sumw2();
heff->Divide(hnum, hden, 1., 1., “b”);
heff->SetMinimum(0.);
heff->SetMaximum(1.2);
heff->Draw();

}

Error comes out while executing ss.Draw(num_plot.c_str(), num_cut.c_str()) command.

It looks like within CASTOR2 environment some ROOT files become corrupted (bad TTree entries):

Error in TBranch::GetBasket: File: castor:///castor/cern.ch/user/b/bifani/pigg/scmp/SS2-09/4.root at byte:0, branch:pi_p, entry:-1, badread=-1

After checking my macros I can reasonably state that this problem is not related to them because the same castor file sometimes is correctly read, sometimes is not… but I have no idea of what is going on.

s.

Hi,

I could reproduce the problem with 5.14.00 . However the problem disappears with 5.15.02 and higher. Although with 5.15.02 we introduced a new version of TCastorFile, I do not think that the problem comes from there: indeed, if I use 5.14.00d or 5.14.00e, which contain the backport of the the current TCastorFile, the problem is still there. I suspect that one of the many changes / fixes occured in the IO solved.

So, for the time being, I suggest that either you move to a more recent version of ROOT, or that you use TRfioFile instead of TCastorFile; you can get the latter by prefixing your CASTOR paths by “rfio:” instead of “castor:”, e.g.

G. Ganis

I switched to ROOT 5.15.06 and rfio protocol:

TChain ss("Tree");
ss.Add("rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/0.root");
ss.Add("rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/1.root");
ss.Add("rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/2.root");
ss.Add("rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/3.root");
ss.Add("rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/4.root");
ss.Add("rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/5.root");

Now everything works when executing the macro in a ROOT shell:

root [0]
Processing eff.C(“09”,“ss3”)…
**** : trace level set to 3
stager: stage_get Usertag=NULL Protocol=rfio File=/castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/0.root
stager: Opt SVCCLASS=na48
stager: Setting euid: 16520
stager: Setting egid: 1338
stager: Creating socket for stager callback
stager: Will wait for stager callback on port 39943
stager: Apr 28 10:07:42 (1177747662) Sending request
stager: Waiting for acknowledgement
stager: 463300ce-0000-1000-b994-d2ee8bb30000 SND 0.03 s to send the request
stager: Request sent to RH - Request ID: 463300ce-0000-1000-b994-d2ee8bb30000
stager: Waiting for callback from stager
stager: 463300ce-0000-1000-b994-d2ee8bb30000 CBK 0.49 s before callback was received

but when I submit it to LSF, it loops forever waiting for the same file:

Processing eff.C(“09”,“ss3”)…
**** : trace level set to 3
stager: stage_get Usertag=NULL Protocol=rfio File=/castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/0.root
stager: Opt SVCCLASS=na48
stager: Setting euid: 16520
stager: Setting egid: 1338
stager: Creating socket for stager callback
stager: Will wait for stager callback on port 36074
stager: Apr 28 01:38:54 (1177717134) Sending request
stager: stage_get Usertag=NULL Protocol=rfio File=/castor/cern.ch/user/b/bifani/pigg/scmp/SS3-09/0.root
stager: Opt SVCCLASS=na48
stager: Setting euid: 16520
stager: Setting egid: 1338
stager: Creating socket for stager callback
stager: Will wait for stager callback on port 32414
stager: Apr 28 01:39:02 (1177717142) Sending request

The stager sends a request:

stager: Apr 28 01:39:02 (1177717142) Sending request

but there is no answer… the stager doesnt wait for acknowledgement or callback:

stager: Waiting for acknowledgement
stager: 463300ce-0000-1000-b994-d2ee8bb30000 SND 0.03 s to send the request
stager: Request sent to RH - Request ID: 463300ce-0000-1000-b994-d2ee8bb30000
stager: Waiting for callback from stager
stager: 463300ce-0000-1000-b994-d2ee8bb30000 CBK 0.49 s before callback was received

I’m sorry… the previous problem is due to a wrong stager definition in the submitted job.
I fixed it but I get the same error as before: TTrees look like corrupted while are not (checked within the old CASTOR environment).
It happens without any logic… random files and random branches:

SysError in TRFIOFile::ReadBuffer: error reading from file rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root (T
imed out)
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:g_emin,
entry:1584028, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_cpre,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_m1tp,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_cpre,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_m1tp,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_cpre,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_m1tp,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_cpre,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_m1tp,
entry:1644085, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:tw_cpre,
entry:1644085, badread=0
Error: Symbol G__exception is not defined in current scope /afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C:411:
Error: type G__exception not defined FILE:/afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C LINE:411
*** Interpreter error recovered ***

SysError in TRFIOFile::ReadBuffer: error reading from file rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root (T
imed out)
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:g_emin,
entry:1584028, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:pi_p, en
try:1648282, badread=0
Error: Symbol G__exception is not defined in current scope /afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C:411:
Error: type G__exception not defined FILE:/afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C LINE:411
*** Interpreter error recovered ***

SysError in TRFIOFile::ReadBuffer: error reading from file rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root
(Timed out)
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:g_emin
, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:pi_rdc
h1, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:pi_rdc
h1, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:pk_dis
t, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:cls_di
st, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:g0_pi_
dist, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:g1_pi_
dist, entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:vtx_z,
entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:vtx_z,
entry:267329, badread=0
Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/12_1.root at byte:0, branch:g_emin
, entry:267329, badread=0
Error: Symbol G__exception is not defined in current scope /afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C:411:
Error: type G__exception not defined FILE:/afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C LINE:411
*** Interpreter error recovered ***

Hi,

I do not have any problem in doing simple draw operations with the files that you quote.

Can you send me something that I can run to reproduce the problem in your context (the macro that you have posted is not complete).

Also, how do you set the environment to run a specific ROOT? Which CASTOR version is available on the machines? Are you using standard lxplus /lxbatch machines?

G. Ganis

I guess I’m working with standard lxplus machine (ssh -l bifani lxplus.cern.ch) and LSF (bsub -q $QUEUE -N -u simone.bifani@cern.ch -o out -e err $SCRIPT)

This is a tipical submitted script:

#!/bin/bash
export RFIO_USE_CASTOR_V2=YES
export STAGE_HOST=castorpublic
export STAGE_SVCCLASS=na48
cd $WORKDIR
root -l -b -q ‘ana.C(“09”,“ss3”)’

The ana.C macro is also placed in $WORKDIR:

void ana(string run, string ssn) {

pigg_eff(run, ssn);
pigg_acc(“pigg”, run, ssn);
pigg_acc(“pipi0”, run, ssn);
pigg_acc(“pipi0pi0”, run, ssn);
pigg_acc(“pipi0g_ib”, run, ssn);
pigg_acc(“pipi0g_de”, run, ssn);
pigg_acc(“pipi0g_int”, run, ssn);

}

pigg_eff and pigg_acc are other macros not defined in ana.C file but in different files loaded by root_logon.C:

gROOT->ProcessLine(".L comp.C");
gROOT->ProcessLine(".L eff.C");
gROOT->ProcessLine(".L load.C");
gROOT->ProcessLine(".L mc.C");
gROOT->ProcessLine(".L misc.C");

In /afs/cern.ch/user/b/bifani/public/for_ganis/ there is an “old” dir in which you can find all the scripts in the configuration I’m working with (ana.c + root_logon.C).

There is also a “new” dir in which I put all the needed macros in ana.C so that you can run it wherever you want (no gROOT->ProcessLine(".L .C")).

Today I find out that submitting the latter ana.C no errors (Error in TBranch::GetBasket: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/13.root at byte:0, branch:g_emin,
entry:1584028, badread=0 ) prompt out.
Do you know why? Is there any problem with LSF and the way I’m using the root_logon.C file?

PS. To submit jobs you need launch.sh and ana.c files and execute ./launch.sh N, N = 00, 01, 02…

Hi,

Ok, thanks for the macros, the rootlogon.C is not there but in the ‘new’ part there are versions of the macros used by ana.C . I hope that I will manage to have a detailed look later today.

However, one think that I can say is that you do not seem to have any control on the ROOT version you use (unless you have some setting in your .bashrc or in the one of your group/experiment).

Just to make sure, could you add a

which root

command in your LSF script (for example just after cd $WORKDIR) and see what it gives in your case?

G. Ganis

I copied my root_logon.C in /afs/cern.ch/user/b/bifani/public/for_ganis

About the ROOT version I’m using… I have a ROOT setenv in my .tcshrc

setenv ROOTSYS /afs/cern.ch/sw/lcg/external/root/5.15.06/slc4_ia32_gcc34/root

and if I check “which root” in a submitted job I’ll get

/afs/cern.ch/sw/lcg/external/root/5.15.06/slc4_ia32_gcc34/root/bin/root

I guess it is ok, don’t you?

About my problems with the TTree acces… yesterday night I submitted more than 100 “old” jobs (ana.C contains only ana() macro, root_logon.C is needed to call other macros), and all of them have been successfully completed. I’m really surprised!!!

Now I’ll submit other jobs as a check… Hope the problem is solved :slight_smile:

No good news from the last submitted job… same TTree error from random files.

What about you?

Hi,

I have looked again at the problem and, a part from some memory leaks, I did not find anything special in the code.

I still believe that the problem has something to do with the enviromnent, and with the fact that wrong or mixed versions are used for some reason.

Moreover, according to the LXBATCH page, not all the machines run the same OS and are of the same architecture: there are similar chances to get SLC4/AMD64 or SLC4/IA32, and a little chance to get SLC3. The ROOTSYS settings should reflect that. This is a possible way to detect

#!/bin/bash

# Set here the ROOT version
rver="5.15.06"

# Get the machine type from the env MACHTYPE, if defined
arch="$MACHTYPE"
if test "x$arch" = "x" ; then
   # Guess it from the `uname -m" output
   arch=`uname -m`
fi

# Build the binary identification string (default SLC4/AMD64)
binary="slc4_amd64_gcc34"
# Assume IA32 if not AMD64
if test ! "x$arch" = "xx86_64" ; then
   binary="slc4_ia32_gcc34"
fi
# Some machines are still running SLC3
slc3=`grep "release 3" /etc/redhat-release`
if test ! "x$slc3" = "x" ; then
   binary="slc3_ia32_gcc323"
fi
echo "binary=$binary"

# Path to the ROOT version to use
rsys="/afs/cern.ch/sw/lcg/external/root/$rver/$binary/root/"
echo "ROOTSYS=$rsys"

# Set the related environment 
export ROOTSYS=$rsys
export LD_LIBRARY_PATH=$ROOTSYS/lib:$LD_LIBRARY_PATH
export PATH=$ROOTSYS/bin:$PATH
which root

I suggest that you add something like this in your launch scripts and try again.

G. Ganis

Hi,
I had some problems with the code you send me so I modified it in that way:

#!/bin/bash
VER="5.15.06"
BIN="slc4_amd64_gcc34"
if [ `arch` == "x86_64" ]; then
    BIN="slc4_ia32_gcc34"
fi
if [ `arch` == "i686" ]; then
    BIN="slc3_ia32_gcc323"
fi
export ROOTSYS=/afs/cern.ch/sw/lcg/external/root/$VER/$BIN/root
export LD_LIBRARY_PATH=$ROOTSYS/lib:$LD_LIBRARY_PATH
export PATH=$ROOTSYS/bin:$PATH

Is it ok?
Which is the difference between amd64 and ia32? Because if ia32 means x86_64 it should be a 64 bit CPU just like amd64, isn’t it?

Using the new definition of the ROOT environment each submitted job has the right ROOTSYS definition (checked with the “which root” command: /afs/cern.ch/sw/lcg/external/root/5.15.06/slc4_ia32_gcc34/root/bin/root) but they keep on crashing:

SysError in <TRFIOFile::ReadBuffer>: error reading from file rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root (Ti
med out)
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:g_emin, e
ntry:2242379, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:g_emin, e
ntry:2290259, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:pk_dist,
entry:2290258, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:vtx_z, en
try:2290258, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:vtx_z, en
try:2290258, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:g_emin, e
ntry:2290259, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:pk_dist,
entry:2290258, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:vtx_z, en
try:2290258, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:vtx_z, en
try:2290258, badread=0
Error in <TBranch::GetBasket>: File: rfio:///castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root at byte:0, branch:g_emin, e
ntry:2290259, badread=0
Error: Symbol G__exception is not defined in current scope  /afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C:411:
Error: type G__exception not defined FILE:/afs/cern.ch/na48/user/bifani/na48_2/macro/eff.C LINE:411
*** Interpreter error recovered ***

I noticed that every crashed job reports a time out error:

so I checked the stager status:

stager: stage_get Usertag=NULL Protocol=rfio File=/castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root
stager: Opt SVCCLASS=na48
stager: Setting euid: 16520
stager: Setting egid: 1338
stager: Creating socket for stager callback
stager: Will wait for stager callback on port 32242
stager: May  4 13:00:56 (1178276456) Sending request
stager: Waiting for acknowledgement
stager: 463b1268-0000-1000-9250-92a07ee30000 SND 0.02 s to send the request
stager: Request sent to RH - Request ID: 463b1268-0000-1000-9250-92a07ee30000
stager: Waiting for callback from stager
stager: 463b1268-0000-1000-9250-92a07ee30000 CBK 1279.91 s before callback was received

file probably overwritten: stopping reporting error messages

I’m surprised that the CBK time of the file that makes the job crash is 1279.91 s while it is usually less then 20 s.

Running the same script in an interactive shell I get no errors. Checking the stager:

stager: stage_get Usertag=NULL Protocol=rfio File=/castor/cern.ch/user/b/bifani/pigg/scmp/SS0-09/4.root
stager: Opt SVCCLASS=na48
stager: Setting euid: 16520
stager: Setting egid: 1338
stager: Creating socket for stager callback
stager: Will wait for stager callback on port 35468
stager: May  4 15:03:57 (1178283837) Sending request
stager: Waiting for acknowledgement
stager: 463b2f3d-0000-1000-a761-fbec426c0000 SND 0.02 s to send the request
stager: Request sent to RH - Request ID: 463b2f3d-0000-1000-a761-fbec426c0000
stager: Waiting for callback from stager
stager: 463b2f3d-0000-1000-a761-fbec426c0000 CBK 6.58 s before callback was received

Now the callback time is ok and the whole file is processed showing that it is not overwritten or corrupted.

Could a large CBK time be due to multiple accesses? Could they be the source of my problem?

Hi,
today I got the SysError in TRFIOFile::ReadBuffer (due to a very long callback time) in an interactive ROOT shell for the very first time.

Network problems (or something related to disks, stager, etc.) are more probable than enviromnent wrong definitions now.

Hi,

The fact that the problem persists through different ROOT versions and configurations may indeed indicate that it comes from problems on the server side. Also the fact that it worked with Castor 1 goes in this direction, as there were almost no change in the way ROOT uses a castor file between vs 1 and vs 2, while there are several on the Castor side.

I suggest that you contact castor.support@cern.ch asking if there was anything in the server logs in correspondence to your attempts; send them the file paths and the rough time when the attempts were made.

Let me know the reply.

G. Ganis

Hi,
I wrote an email to na48-support@cern.ch this morning:

They told me that:

Actually, I pre-staged all needed files but for some reasons I keep on getting large callback time.
Can ROOT handle very large CBK times?

Hi,

The ROOT client classes involved in handling Castor files do not apply time-outs: the observed timeout must be internal to the fucntions used to read the file, e.g. rfio_read or stage_open.

So, I am afraid that there is not much we can do inside ROOT for that.

Have a try with castor.support@cern.ch: it may give an hint.

G. Ganis