Home | News | Documentation | Download

RDataFrame snapshot python seg fault in ROOT v6.20.04 (working in v6.14.08)

Hello,

I’ve found some unexpected behavior in RDataFrame in PyROOT when upgrading from ROOT v6.14.08 to v6.20.04. I’ve been using RDataFrame to process NanoAOD format data and simulation.

As a simple example, the following python code will produce a seg fault crash in v6.20.04 but not v6.14.08.

r = ROOT.RDataFrame("Events","file.root")
r2 = r.Filter("boolBranch")
r2.Snapshot('test','test.root','intBranch')

I’ve isolated branches storing bools to be problematic since I can swap in branches of other types (ints, vectors, etc) and there will be no issue. What’s perhaps most strange is that the following works fine in v6.20.04:

r = ROOT.RDataFrame("Events","file.root")
r2 = r.Filter("boolBranch")
r2.Snapshot('test','test.root','')

The only difference is that I haven’t specified a branch to snapshot. If I swap intBranch and boolBranch in the first code block, it will seg fault.

Are there any changes between v6.14 and v6.20 that could cause such behavior? I’m happy to send an example file privately if that would be useful but from what I’ve tested so far, I have no reason to believe the same issue won’t occur for any NanoAODv5 or v6 sample.

Thanks!

ROOT Version: v6.20.04 and v6.14.08
Platform: Ubuntu 18.04 LTS
Compiler: Not Provided


Hi @lcorcodilos,
thank you for the report! v6.14 is ages ago in terms of RDataFrame development, a lot of the logic at various layers has been changed (mostly for the better, but a regression such as this is indeed possible).

I can’t reproduce the crash with the following test:

import ROOT
ROOT.RDataFrame(1).Define("x", "true").Snapshot("t", "f.root")
ROOT.RDataFrame("t", "f.root").Snapshot("t", "f2.root", "x")

so it does not seem the problem is with any Snapshot of boolean values. It would be useful if you could provide the dataset that triggers it.

Cheers,
Enrico

Hi @eguiraud,

Actually your example does not work for me… and if I swap “true” with “5”, it works. So maybe this is something with my ROOT build? I can try building from source again. Is there a version you’d recommend or specific build options that people commonly miss for RDataFrame?

Thanks again,
Lucas

EDIT: To clarify, it seg faults at the first step of filling the data frame.

I think I found the issue. I rebuilt v6.20 and was able to run your example successfully. I then remembered that I had built the problematic version of v6.20 with the cmake option -DCMAKE_CXX_STANDARD="17" for something else I was debugging. So I think using cxx17 is the issue.

It turned out that I don’t need cxx17 so I’ll switch to the v6.20 build with the default cxx11 since that works. However, it would be interesting to know if this sort of behavior with cxx17 is expected or unexpected.

Thanks for your help!

1 Like

Well – spoiler alert: segfaults depending on the C++ standard are not expected :sweat_smile: Thank you very much for the report, I will look into it.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

(bumping this up just to say that ROOT.RDataFrame(1).Define("x", "true").Snapshot("t", "f.root") works fine with the ROOT conda package (which is a C++17 build) or with my own C++17 builds)

This topic was automatically closed after 13 days. New replies are no longer allowed.