Executing pyRoot file in subprocess leads to segmentation error

anna98 · January 16, 2025, 1:38pm

Hi,
I’m using pyroot on lxplus and wrote a Framework, which does the Analysis I need in an automated way. However, when I run my script via a subprocess, I run into a Segmentation error:

The lines below might hint at the cause of the crash. If you see question
marks as part of the stack trace, try to recompile with debugging information
enabled and export CLING_DEBUG=1 environment variable before running.
You may get help by asking at the ROOT forum https://root.cern/forum
preferably using the command (.forum bug) in the ROOT prompt.
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs or (preferably) using the command (.gh bug) in
the ROOT prompt. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x0000557b5bac2340 in ?? ()
#7  0x00007f47f657775d in CPyCppyy::op_dealloc_nofree(CPyCppyy::CPPInstance*) () from /usr/lib64/python3.9/site-packages/libcppyy.so
#8  0x00007f47f65777a5 in CPyCppyy::op_dealloc(CPyCppyy::CPPInstance*) [clone .lto_priv.0] () from /usr/lib64/python3.9/site-packages/libcppyy.so
#9  0x00007f4808753b46 in subtype_dealloc () from /lib64/libpython3.9.so.1.0
#10 0x00007f480872246c in list_dealloc () from /lib64/libpython3.9.so.1.0
#11 0x00007f480872be68 in dict_dealloc () from /lib64/libpython3.9.so.1.0
#12 0x00007f480872be68 in dict_dealloc () from /lib64/libpython3.9.so.1.0
#13 0x00007f4808811bc6 in module_dealloc () from /lib64/libpython3.9.so.1.0
#14 0x00007f4808720b81 in insertdict () from /lib64/libpython3.9.so.1.0
#15 0x00007f4808811650 in _PyImport_Cleanup () from /lib64/libpython3.9.so.1.0
#16 0x00007f48088102ab in Py_FinalizeEx () from /lib64/libpython3.9.so.1.0
#17 0x00007f4808802c1d in Py_RunMain () from /lib64/libpython3.9.so.1.0
#18 0x00007f48087d502d in Py_BytesMain () from /lib64/libpython3.9.so.1.0
#19 0x00007f48082295d0 in __libc_start_call_main () from /lib64/libc.so.6
#20 0x00007f4808229680 in __libc_start_main_impl () from /lib64/libc.so.6
#21 0x0000557b0d195095 in _start ()

BUT my root script correctly runs until the end and generates my plots. If I run the script individually without a subprocess, I do not see the segmentation error. Hence I suppose, it has something to do with the subprocess itself.

This is how I call my subprocess:

result = subprocess.run(commandMHT, input = self.input_to_provide, text = True, check = True, env = self.env)

and my environment for the subprocess looks like this:

        env = os.environ.copy()
        env['CLING_DEBUG'] = '1'
        env['OMP_NUM_THREADS'] = '1'
        env['OPENBLAS_NUM_THREADS'] = '1'

If you need more detailed information, I can also point you to my gitlab repository and give the exact files in question. I hope my request does not lack important information, if so, please point me to it and I’ll happily provide more.

ROOT Version: 6.34.02
Platform: lxplus
Compiler: Not Provided

Danilo · January 16, 2025, 8:04pm

Hi Anna,

Thanks for the post and welcome to the ROOT Community!
I am sorry to read the analysis did not work out of the box for you. Can you put us in condition of reproducing the problem, e.g. with a minimal stripped down version of your program?

Cheers,
Danilo

anna98 · January 17, 2025, 9:56am

Hi Danilo, yes of course.
So my runAnalysis.py calls the script YARRscan_comparison.py. This basically creates a set of root plots in a loop (I compare different testing stages) and then outputs a summary.
The last lines of my script look like this:

for scan in scans()
       canvas[-1].Update()

        ### CSV output

       utils.output_CSV(graphs_failed_pixels[-1]['failed'], scan_name, hlabel, outputbase, serial_numbers, analysed_data)
        

if not os.path.exists('plots'):
    os.mkdir('plots')

if args.png:

    if not os.path.exists(outputbase):
        os.mkdir(outputbase)

    csummary.Print(outputbase + '/csummary.png')
    for ic, c in enumerate(canvas):
        name = c.GetName()
        c.Print(outputbase + '/' + name + '.png')
else:

    csummary.Print(outputbase + '.pdf(')
    for ic, c in enumerate(canvas):
        if ic == len(canvas) - 1:
            c.Print(outputbase + '.pdf)')
        else:
            c.Print(outputbase + '.pdf')

csummary.Update()

print("All done!")

So I see the All done print, and after that the segmentation error occurs. When I try to run the YARRscan_comparison.py on it’s own, it runs without any segmentation errors. As soon as I start the subprocess, the error occurs.

Hence I would suspect the error to be connected to the subprocess, that’s called by Python. This is the function, which executes the subprocess in my script:

    def generateMHT(self):
        pathMHT = os.path.join(self.basePath, 'plotConfigs/MHT.yml')
        resultPath = 'plots/MHT'
        resultFileNoise = 'plots/MHT_NoiseDist.csv'
        resultFileThresh = 'plots/MHT_ThresholdDist.csv'
        finalDest = os.path.join(self.finalDest, 'MHT')
        resultFileAnalog = 'plots/MHT_AnalogPixelAna.csv'
        resultFileDigital = 'plots/MHT_DigitalPixelAna.csv'
        commandMHTpdf = [
            "python3",
            "analysisHandler/YARRscan_comparison.py",
            pathMHT
        ]

        result = subprocess.run(commandMHTpdf, input = self.input_to_provide, text = True, check = True)
        os.makedirs(os.path.dirname(finalDest), exist_ok=True)
        shutil.copy(resultFileNoise, finalDest)
        shutil.copy(resultFileThresh, finalDest)
        shutil.copy(resultFileAnalog, finalDest)
        shutil.copy(resultFileDigital, finalDest)

So it really only calls the script that generates my plots and copies them to the final destination.

Is this enough information for you? I am not sure, if I’m able to create a minimal stripped down version of the script, but if it help, I could also send you the whole repo and point you to the scripts there. I hope this is ok for you! Thanks for your efforts!

Danilo · January 18, 2025, 3:23pm

Hi Anna,

How can I run this code and see the error on a test machine?

Best,
D

anna98 · January 22, 2025, 1:52pm

Hi Danilo,

I created a minimum example and pushed it to gitlab:
gitlab.cern.ch/aswoboda/plottingtool-minimum-working-example

You should be able to clone it, source the setup script and run the runAnalysis.py script.
Please note, that I included some modules, which the YARRscan_comparison.py script uses. I thought I leave them in, since this script runs smoothly without starting it via a subprocess and I think this creates the most realistic picture, of how I run it.
Running this script also in this minimal-version with this one subprocess causes the process to crash, at least on my PC.
Maybe another word about my environment, I ran this script on lxplus, with python version used 3.9.21.

I hope the way I’m sharing this with you is ok, and I hope that I gave you all the information you need.
Thanks again,
Anna

jonas · January 23, 2025, 6:06am

Dear @anna98,

thank you so much for sharing the full reproducer!

The problem is a known regression in ROOT 6.34.00 and 6.34.02. We have already seen this in a different forum post, and it will be fixed in the next patch release ROOT 6.34.04.

Until that release, you can work around this crash by manually setting the ownership of the cloned histograms:

diff --git a/python/pixel_analysis.py b/python/pixel_analysis.py
index 40affa2..9e10c40 100644
--- a/python/pixel_analysis.py
+++ b/python/pixel_analysis.py
@@ -103,6 +103,8 @@ def pixel_analysis_threshold(hists, hlabel, canvas):
     hists_without_zero_bin = {}
     for stage in hists.keys():
         hists_without_zero_bin[stage] = [hists[stage][ichip].Clone(hists[stage][ichip].GetName() + '_wo_zerobin') for ichip in range(4)]
+        for h in hists_without_zero_bin[stage]:
+            ROOT.SetOwnership(h, False)

         for ichip in range(4):
             hists_without_zero_bin[stage][ichip].SetBinContent(0, 0)

Sorry for the inconvenience!

Cheers,
Jonas

anna98 · January 23, 2025, 7:23am

Hi Jonas,

Alright, thank you so much for looking into this and sharing a solution!

So if I’m not mistaken, it looks like root was updated on the 26th of December:

[aswoboda@lxplus933 public]$ root-config --version
6.34.02
[aswoboda@lxplus933 public]$ ls -ltr /usr/local/bin/root* /usr/bin/root* /opt/*/root* 2>/dev/null
-rwxr-xr-x. 1 root root   5445 Dec 26 11:16  /usr/bin/rootssh
-rwxr-xr-x. 1 root root    821 Dec 26 11:16  /usr/bin/roots
-rwxr-xr-x. 1 root root  31523 Dec 26 13:45  /usr/bin/root-config
-rwxr-xr-x. 1 root root  15832 Dec 26 14:01  /usr/bin/root.exe
-rwxr-xr-x. 1 root root  24152 Dec 26 14:01  /usr/bin/roots.exe
-rwxr-xr-x. 1 root root  15840 Dec 26 14:01  /usr/bin/rootn.exe
-rwxr-xr-x. 3 root root  15696 Dec 26 14:01  /usr/bin/rootcling
-rwxr-xr-x. 3 root root  15696 Dec 26 14:01  /usr/bin/rootcint
-rwxr-xr-x. 1 root root 209416 Dec 26 14:01  /usr/bin/rootreadspeed
-rwxr-xr-x. 1 root root  40816 Dec 26 14:01  /usr/bin/root

What would interest me now is the naive question, if the root updates on lxplus are done automatically? If yes, how long is the usual delay between a new root release and the update being available on lxplus?

Cheers,
Anna

jonas · January 23, 2025, 7:39am

Hi @anna98, lxplus is using the Fedora Linux distribution, and the packagers are following the ROOT releases very closely. You can expect only a few days of delay.

By the way, by sourcing the right thisroot.sh file, you can activate any version of ROOT on lxplus, independent of what the system provides.

E.g., ROOT 6.32.08 can be activated with this line:

source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.08/x86_64-almalinux9.4-gcc114-opt/bin/thisroot.sh

You can find the relevant installation paths on the website.

If you don’t need ROOT 6.34 specific features, using the previous ROOT version like this can also be a solution until 6.34.04 is released.

Cheers,
Jonas