I’ve found some unexpected behavior in RDataFrame in PyROOT when upgrading from ROOT v6.14.08 to v6.20.04. I’ve been using RDataFrame to process NanoAOD format data and simulation.
As a simple example, the following python code will produce a seg fault crash in v6.20.04 but not v6.14.08.
r = ROOT.RDataFrame("Events","file.root")
r2 = r.Filter("boolBranch")
r2.Snapshot('test','test.root','intBranch')
I’ve isolated branches storing bools to be problematic since I can swap in branches of other types (ints, vectors, etc) and there will be no issue. What’s perhaps most strange is that the following works fine in v6.20.04:
r = ROOT.RDataFrame("Events","file.root")
r2 = r.Filter("boolBranch")
r2.Snapshot('test','test.root','')
The only difference is that I haven’t specified a branch to snapshot. If I swap intBranch and boolBranch in the first code block, it will seg fault.
Are there any changes between v6.14 and v6.20 that could cause such behavior? I’m happy to send an example file privately if that would be useful but from what I’ve tested so far, I have no reason to believe the same issue won’t occur for any NanoAODv5 or v6 sample.
Thanks!
ROOT Version: v6.20.04 and v6.14.08 Platform: Ubuntu 18.04 LTS Compiler: Not Provided
Hi @lcorcodilos,
thank you for the report! v6.14 is ages ago in terms of RDataFrame development, a lot of the logic at various layers has been changed (mostly for the better, but a regression such as this is indeed possible).
I can’t reproduce the crash with the following test:
Actually your example does not work for me… and if I swap “true” with “5”, it works. So maybe this is something with my ROOT build? I can try building from source again. Is there a version you’d recommend or specific build options that people commonly miss for RDataFrame?
Thanks again,
Lucas
EDIT: To clarify, it seg faults at the first step of filling the data frame.
I think I found the issue. I rebuilt v6.20 and was able to run your example successfully. I then remembered that I had built the problematic version of v6.20 with the cmake option -DCMAKE_CXX_STANDARD="17" for something else I was debugging. So I think using cxx17 is the issue.
It turned out that I don’t need cxx17 so I’ll switch to the v6.20 build with the default cxx11 since that works. However, it would be interesting to know if this sort of behavior with cxx17 is expected or unexpected.
(bumping this up just to say that ROOT.RDataFrame(1).Define("x", "true").Snapshot("t", "f.root") works fine with the ROOT conda package (which is a C++17 build) or with my own C++17 builds)