Python script hangs because of root's module

Hello,

I have created a python script which just imports the ROOT.py module like this:

[atpilot002@td425 tmp]$ cat myscript.py
#!/usr/bin/python
from ROOT import *

Then I run my script passing a file as an argument. Nothing should happen because the python script just includes the ROOT module, but:

I define the variable file first:

[atpilot002@td425 tmp]$ file=“dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6”

Run the script

[atpilot002@td425 tmp]$ python myscript.py $file
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored

All those warnings are code from the ROOT.py module.

Now, if I run the same script but with the arguments “$file,$file,$file,$file,$file” it hangs.

[atpilot002@td425 tmp]$ python myscript.py $file,$file,$file,$file,$file
*** glibc detected *** python: free(): invalid next size (normal): 0x0a04bf40 ***

Instead if I do it with just 4 files it doesn’t hang, but the ROOT module does plenty of things, remember the myscript just includes the ROOT module, nothing more:

[atpilot002@td425 tmp]$ python myscript.py $file,$file,$file,$file
Command failed!
Server error message for [1]: “path /pnfs/fs/usr/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap:/dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap:/dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap:/dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6 not found” (errno 10001).
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored
Warning in TEnvRec::ChangeValue: duplicate entry <Library.vector=vector.dll> for level 0; ignored

This is what is happening with some atlas user analysis scripts at pic. If the user has an argument with a long list of files separated by commas, the ROOT module makes the script hang and stays like that for ever. I have retrieved the list of files the user was passing to its script and I have run manually the command. This is what I get:

[atpilot002@td425 tmp]$ python slim.py dcap://dcap.pic.es:22125/pnfs/pic.es/da … ool.root.6
*** glibc detected *** python: free(): invalid next size (normal): 0x0aff9aa8 ***

And of course it hangs for ever.

Head of the user script:

[atpilot002@td425 tmp]$ head -3 slim.py
#!/usr/bin/python
from ROOT import *

A user should be able to decide to have its script with an argument with a list of files separated by commas, but I have the impression that the ROOT module interprets as a file or something else, and making some kind of buffer overflow. In my opinion the ROOT module should not execute anything unless explicitly asked, and it does.

Version number’s:

[atpilot002@td425 tmp]$ for a in echo $PYTHONPATH |sed -e 's/:/ /g'; do echo $a; done|grep -i root/software/atlas/ifae/prod/releases/rel_15-19/sw/lcg/app/releases/ROOT/5.22.00h/i686-slc5-gcc43-opt/root/lib

[iatpilot002@td133 tmp]$ root-config --version
5.22/00h

[atpilot002@td425 tmp]$ python --version
Python 2.5.4

Thanks so much!
Carlos

Hi,

apart from that you should also fix the warnings. They stem from contradicting rootmap files in your LD_LIBRARY_PATH: there are >= 2 *.rootmap files which claim to have dictionaries for vector etc. Maybe once this is fixed also your other problem will vanish. I don’t understand what you are trying to say with “All those warnings are code from the ROOT.py module.”

Cheers, Axel.

Hi,

[quote=“cborrego”]I have created a python script which just imports the ROOT.py module like this:

[atpilot002@td425 tmp]$ cat myscript.py
#!/usr/bin/python
from ROOT import *[/quote]
for reference: doing “from ROOT import *” isn’t “just”. Just importing would be “import ROOT”. The former requires the copying of all entries in the all parameter as per the python language and thus also requires their setup. The latter however, is far more lazy and doesn’t setup any globals until they are actually used.

The file argument is passed to ROOT which recognizes it as a ROOT file and hence will open it, as per above: the “from ROOT import *” requires the import of gApplication and hence its creation. The constructor of TApplication takes sys.argc/v. To tell the module not to hand those over, you can set that option on the ROOT module (“ROOT.PyConfig.IgnoreCommandLineOptions = True”), and/or separate out the user options using a ‘-’. But again, that only works after “import ROOT” not “from ROOT import *”.

Cheers,
Wim

Thanks Wim and Axel for your answers,
I understand that if the user who is sending to our site the python script passes the arguments:

python userscript.py $file1,$file2,$file3,$file4

the root module is trying to open a non existing file called “$file1,$file2,$file3,$file4”. But the script is actually hanging just after its execution with the error:

*** glibc detected *** python: free(): invalid next size (normal): 0x0ac01480 ***

This should not happen in any case, right? Axel is saying it could be due to the a problem in which 2 rootmap files that claim to have dictionaries for vector. Could this hang a script? Shouldn’t be a bug if the script hangs any case?

Thanks so much
Carlos

Carlos,

yes, it should not crash. But it doesn’t in my setup, to be sure … So yes, I’d like to see a cleanup of individual .rootmap files first (which could be due to an installation problem or a mix of installations) before giving any further diagnosis.

As for opening a file “$file1,$file2,$file3,$file4”, I guess that depends on how the shell hands over the arguments. Irrespective of ROOT, what does sys.argv look like in your case?

Cheers,
Wim

P.S. I’ll be offline until June 3rd due to family matters.

Thanks Wim,
My argv looks as expected, two entries, the name of the script and the non-existing filename. I have commented the “from ROOT import” not to make the script hang:

[atpilot002@td425 tmp]$ cat myscript.py
#!/usr/bin/python
#from ROOT import *
import sys
print "sys.argv = ", sys.argv

[i]
[atpilot002@td425 tmp]$ file=“dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6”

[atpilot002@td425 tmp]$ python myscript.py $file,$file,$file,$file,$file
sys.argv = [‘myscript.py’, ‘dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6’]
[/i]

From another hand I have tried to open the file “$file,$file,$file,$file,$file” by hand using root directly. When I paste the name of the file in the root shell I get a buffer overflow:

[atpilot002@td425 tmp]$ root -l
*** DISPLAY not set, setting it to ui02.pic.es:0.0
root [0] $30_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765
*** Error: Getline(): input buffer overflow

I think there is something wrong here.

Thanks so much
Carlos

Hi Carlos,

It looks like the name of your file is too long for the command line of the version of ROOT that you are using. I recommend that instead of passing the filename directly, you pass them indirectly via a text file containing a line for each file name (which you can read using the cstdio or iostream).

Cheers,
Philippe.

Thanks Philippe,
I know it’s a size issue. But I think that the python root module should not hang when it comes to these large arguments, it should give just an error and finish properly. Don’t you think so?
Thanks so much!
Carlos

Hi,

Indeed … However, which version of ROOT do you reproduce this problem with? (we are only patching v5.22, which is now quite old, for complete show stoppers ; otherwise we recommend to move to v5.26/00c or v5.27/02).

Cheers,
Philippe.

Thanks Philippe,
The version is actually 5.22.00
Thanks again
Carlos

Hello again,
I have tried with root version 5.26 and it fails as well:

[atpilot002@td425 tmp]$ source /nfs/pic.es/tier2/scratch/AtlasSoftware/etc/profile.d/gcc4.3_setup.sh
[atpilot002@td425 tmp]$ source /nfs/pic.es/tier2/scratch/AtlasSoftware/root/root-5.26_slc5_gcc4.3_x86-64/bin/thisroot.sh
[atpilot002@td425 tmp]$ python myscript.py $file,$file,$file,$file,$file
*** glibc detected *** python: free(): invalid next size (normal): 0x0000000013f3ad50 ***

Thanks so much
Carlos

Hi,

I can reproduce it, valgrind only complains about python / pyroot but I might be using the wrong suppression file. Wim?

$ cat t.py #!/usr/bin/python from ROOT import *

python t.py dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000001.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000002.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000003.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000004.pool.root.6,dcap://dcap.pic.es:22125/pnfs/pic.es/data/atlas/atlasmcdisk/mc09_7TeV/ESD/e530_s765_s767_r1302/mc09_7TeV.113161.AlpgenJimmyBBjetsNp0_J3.recon.ESD.e530_s765_s767_r1302_tid137248_00/ESD.137248._000005.pool.root.6
(gdb) bt
#0  0x000000000046bc27 in ?? ()
#1  0x00007ffff5e3e17f in pt_getattro (pyclass=0xe88600, pyname=0x7ffff7f8eae0)
    at bindings/pyroot/src/PyRootType.cxx:78
#2  0x00007ffff5e40512 in PyROOT::Pythonize (pyclass=0xe88600, name=@0x7fffffff9a80)
    at bindings/pyroot/src/Pythonize.cxx:1588
#3  0x00007ffff5e545f1 in PyROOT::MakeRootClassFromString<PyROOT::TScopeAdapter, PyROOT::TBaseAdapter, PyROOT::TMemberAdapter> (fullname=@0x7fffffff9b70, scope=0x7ffff698b9b8) at bindings/pyroot/src/RootWrapper.cxx:642
#4  0x00007ffff5e4a258 in PyROOT::MakeRootClass (args=0xdf8d10) at bindings/pyroot/src/RootWrapper.cxx:450
#5  0x00000000004a7c5e in PyEval_EvalFrameEx ()
#6  0x00000000004a9671 in PyEval_EvalCodeEx ()
#7  0x00000000004a7809 in PyEval_EvalFrameEx ()
#8  0x00000000004a9671 in PyEval_EvalCodeEx ()
#9  0x00000000004a9742 in PyEval_EvalCode ()
#10 0x00000000004bbfee in PyImport_ExecCodeModuleEx ()
#11 0x00000000004bdf5e in ?? ()
#12 0x00000000004bed93 in ?? ()
#13 0x00000000004bf00f in ?? ()
#14 0x00000000004bf6cb in ?? ()
#15 0x00000000004bfc14 in PyImport_ImportModuleLevel ()
#16 0x00000000004a16ab in ?? ()
#17 0x000000000041f0c7 in PyObject_Call ()
#18 0x00000000004a226f in ?? ()
#19 0x00000000004a55fc in PyEval_EvalFrameEx ()
#20 0x00000000004a9671 in PyEval_EvalCodeEx ()
#21 0x00000000004a9742 in PyEval_EvalCode ()
#22 0x00000000004c9a0e in PyRun_FileExFlags ()
#23 0x00000000004c9c24 in PyRun_SimpleFileExFlags ()
#24 0x000000000041a7ff in Py_Main ()
#25 0x00007ffff69dac4d in __libc_start_main () from /lib/libc.so.6
#26 0x00000000004199f9 in _start ()

Cheers, Axel.

Axel,

that looks like it’s in the middle of the creation of TApplication and friends (not sure why the stack trace stops where it does).

Can’t reproduce the problem though, as I don’t have dcache installed, so I get:Error in <TUnixSystem::DynamicPathName>: DCache[.so | .sl | .dl | .a | .dll] does not exist in /home/wlav/rootdev/root/lib:/usr/lib/mpi/gcc/openmpi/lib:.:/home/wlav/rootdev/dev/lib::/home/wlav/rootdev/dev/cint/cint/stlrather than a crash.

Cheers,
Wim

Hi Wim,

you should be able to use a ROOT debug build from lxplus.

Axel.

Axel,

well, like I said: looks like in the middle of the TApplication creation. Okay, I’ll have another look, but handing the whole dcap… etc. argument list to root.exe crashes just as hard:

*** glibc detected *** /home/wlav/rootdev/dev/bin/root.exe: malloc(): memory corruption: 0x0814e838 *** ======= Backtrace: ========= /lib/libc.so.6[0xb6dec50b] /lib/libc.so.6[0xb6def400] /lib/libc.so.6(__libc_malloc+0x6a)[0xb6df10ba] /lib/libc.so.6(strndup+0x3b)[0xb6df472b] /opt/d-cache/lib/libdcap.so.1(xstrndup+0x24)[0xb6b49cb4] /opt/d-cache/lib/libdcap.so.1(sendControlMessage+0x67)[0xb6b3bb97] /opt/d-cache/lib/libdcap.so.1(ascii_open_conversation+0x2e5)[0xb6b3bf15] /opt/d-cache/lib/libdcap.so.1(cache_open+0x17d)[0xb6b3d8ad] /opt/d-cache/lib/libdcap.so.1(dc_stat64+0x7e)[0xb6b4494e] /home/wlav/rootdev/dev/lib/libDCache.so(_ZN13TDCacheSystem11GetPathInfoEPKcR10FileStat_t+0x4b)[0xb6b5db9b] /home/wlav/rootdev/dev/lib/libCore.so(_ZN11TUnixSystem11GetPathInfoEPKcR10FileStat_t+0x4e)[0xb7a8408e] /home/wlav/rootdev/dev/lib/libCore.so(_ZN7TSystem11GetPathInfoEPKcPlPxS2_S2_+0x66)[0xb79fa726] /home/wlav/rootdev/dev/lib/libCore.so(_ZN12TApplication10GetOptionsEPiPPc+0x591)[0xb7995eb1] /home/wlav/rootdev/dev/lib/libCore.so(_ZN12TApplicationC2EPKcPiPPcPvi+0x1c8)[0xb7997a08] /home/wlav/rootdev/dev/lib/libRint.so(_ZN5TRintC1EPKcPiPPcPvib+0x4f)[0xb706b7cf] /home/wlav/rootdev/dev/bin/root.exe(main+0x5c)[0x8048ecc]
Later,
Wim

Hi,

here’s the cause:

The CLI string is handed en-bloc to TDCacheSystem which passes it on en-bloc to dcap, which copies it in openStr, which is calloc’-ed with hard limit size set to 1024 and w/o checking on the actual size, then goes plink.

Cheers,
Wim