Crash when accessing column in RDataFrame

Hello, I am getting a segmentation violation when I try to interact with columns I have defined in RDataFrames in PyRoot. In the minimal reproducer below the error happens on hist.Write, but it also happens when I try to create a pdf of the histogram.

import ROOT

fullname = "root:path_to_a_root_file"

tname = "my_tree's_name"

dFrame = ROOT.ROOT.RDataFrame(tname, fullname).Define("nTracks", "Tracks.size()") \
    .Define("Momenta",
            "vector<vector<double>> p; for (int i=0; i<nTracks; i++) {p[i].push_back(Tracks[i].x()); p[i].push_back(Tracks[i].y()); p[i].push_back(Tracks[i].z());} return p;") \
    .Define("Val", "return (Momenta[0][0])")

model = ROOT.RDF.TH1DModel("Val", "Val", 50, 0., 1.)
hist = dFrame.Histo1D(model, "Val")

can = ROOT.TCanvas("canName", "canTitle")
file = ROOT.TFile('reproducerHists', 'RECREATE')
hist.Write()

The entire error puts me above the character limit, so here is just the beginning so that you can see what it looks like:




===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

Thread 7 (Thread 0x7f6fbf7fe700 (LWP 29255)):
#0  0x00007f6fe4563b3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f6fe4563bcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f6fe4563c6b in sem_wait

GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00007f6fc5f74816 in XrdCl::JobManager::RunJobs() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre3/external/slc7_amd64_gcc900/lib/libXrdCl.so.2
#4  0x00007f6fc5f748c9 in RunRunnerThread () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre3/external/slc7_amd64_gcc900/lib/libXrdCl.so.2
#5  0x00007f6fe455dea5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f6fe3b7d9fd in clone () from /lib64/libc.so.6

Here is the only part of the error I sort of understood, which is buried in the middle:

Traceback (most recent call last):
  File "reproducer.py", line 19, in <module>
    hist.Write()
cppyy.ll.SegmentationViolation: TH1D& ROOT::RDF::RResultPtr<TH1D>::operator*() =>
    SegmentationViolation: segfault in C++; program state was reset
 *** Break *** segmentation violation

I can post more of the error upon request.

What is causing this, and/or how to I prevent it?
Thanks in advance for your efforts!

_ROOT Version:_6.22
Platform: Not Provided
Compiler: Not Provided


Hi @bthornbe ,
and welcome to the ROOT forum!

The segfault happens when you do hist.Write because that’s when the event loop is actually triggered – but it is probably due to the logic in your Defines. Most likely culprit: sometimes Momenta[0][0] accesses invalid memory because you don’t have even one element in that vector of vectors.

You can try adding a Filter("nTracks > 0") before Define("Momenta", ...) or using Momenta.at(0).at(0) instead of Momenta[0][0] (at throws an exception while [] does the usual unsafe C++ thing).

I hope this helps!
Enrico

1 Like

I tried adding the filter and using Momenta.at().at(), but I’m still getting the same type of error. I’m also pretty sure that nTracks shouldn’t be 0 (and thus Momenta shouldn’t be empty) for any of the events since I’m using generated data of events with high track multiplicity.

I’m running this on a remote cluster; could that be relevant?

Uhm ok, taking a second look at your code I can see the bug:

You are accessing p[i] but p is empty, there is no p[0].
You need e.g. to do a p.emplace_back() to create one new inner vector<double> at every iteration of the for loop.

Cheers,
Enrico

1 Like

Thanks so much! That solved the error!