Home | News | Documentation | Download

RDataFrame from python3: 6.18 vs. 6.16

Hi,

I noticed a few differences between ROOT 6.16.00 and above (6.18.00 and master(?)), set up respectively with

source /cvmfs/sft.cern.ch/lcg/views/LCG_95apython3/x86_64-centos7-gcc8-opt/setup.sh

,

source /cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/setup.sh

and

source /cvmfs/sft.cern.ch/lcg/views/dev3python3/latest/x86_64-centos7-gcc8-opt/setup.sh

First, in the latter two, retrieving RDataFrame from cppyy.gbl gives a segmentation fault with (full stack trace: cppyyrdataframe_trace.txt (8.6 KB) )

>>> from cppyy import gbl
>>> gbl.RDataFrame
AttributeError: AsNumpy

The above exception was the direct cause of the following exception:

SystemError: <built-in method __subclasscheck__ of ROOT.PyRootType object at 0x7f4b3dfb59c0> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError: <built-in method mro of ROOT.PyRootType object at 0x4d07ad8> returned a result with an error set

 *** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f4b4500bbbc in waitpid () from /lib64/libc.so.6
#1  0x00007f4b44f89ea2 in do_system () from /lib64/libc.so.6
#2  0x00007f4b3bfbeef3 in TUnixSystem::StackTrace() () from /cvmfs/sft-nightlies.cern.ch/lcg/views/dev3python3/Fri/x86_64-centos7-gcc8-opt/lib/libCore.so
#3  0x00007f4b3bfc17c4 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft-[cppyyrdataframe_trace.txt|attachment](upload://oNWqt27w9faskln8jUYgszPbALC.txt) (8.6 KB) nightlies.cern.ch/lcg/views/dev3python3/Fri/x86_64-centos7-gcc8-opt/lib/libCore.so
#4  <signal handler called>
#5  0x00007f4b45d0444c in PyObject_SetAttr (v=0x0, name=0x7f4b3e635a70, value=0x7f4b3e6c14b0) at /mnt/build/jenkins/workspace/lcg_release_latest/BUILDTYPE/Release/COMPILER/gcc8binutils/LABEL/centos7/build/exter
nals/Python-3.6.5/src/Python/3.6.5/Objects/object.c:924

while from import ROOT this works (aside: which one of import ROOT and from cppyy import gbl is recommended?).

Next, passing a string to RDataFrame.Define or RDataFrame.Filter does not work in 6.18 and dev3python3:

>>> df = df.Define("nLeptons", "_nEle + _nMu")                        
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can not resolve method template call for 'Define'

Explicitly constructing the std::string_view in python gets around this, but with branch names that start with an underscore I then get

>>> df = df.Define("nLeptons", getattr(ROOT, "std::string_view")("_nEle + _nMu"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception: ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, basic_string_view<char,char_traits<char> > expression) =>
    Failed to tokenize expression:
_nEle + _nMu

Make sure it is valid C++. (C++ exception of type runtime_error)

(another script that runs on a tree without branch names with a leading underscore works, so I suppose it’s related to that).

I do not know how many of these are bugs (and, if yes, of the installation on cvmfs or ROOT), but all three things are working in 6.16.00

Thanks in advance for your help,
Pieter

1 Like

@etejedor could you have a look at Pieter’s issues?

Hi @Axel, I think that this is a problem in the code that does parsing for branch names + jitting (the problem with underscores, that is). I don’t think it’s related to Python. The same error probably happens in C++. The issue with string_view is strange indeed. Maybe a missing pythonization?

Thank you for reporting.

Regarding the first issue, it seems a problem with AsNumpy not being injected and later used, and it happens only if you import cppyy but not ROOT. I will have look at this and also the second issue and report back.

Thanks a lot for the report!

Enric and I took care of the error occuring during access of RDataFrame through cppyy.gbl. You can follow the integration in ROOT in this Jira ticket.

Best
Stefan

As @swunsch mentioned, the first issue will be fixed by his PR. In the meantime, if you make sure you import ROOT (not just cppyy) you should not see the issue either.

Regarding the second issue, I was able to reproduce in C++ too with the following code:

auto df = ROOT::RDataFrame(10);

auto df2 = df.Define("_nMu", "1");
auto df3 = df2.Define("nLeptons", "_nMu");

that prints:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to tokenize expression:
_nMu

So indeed as @amadio pointed out it does not seem Python-related. I am going to open a ticket for it.

This is the ticket for the second issue:
https://sft.its.cern.ch/jira/browse/ROOT-10305

Thanks a lot for the prompt reactions!

I can confirm that with the current master branch, from

/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3python3/latest/x86_64-centos7-gcc7-opt/setup.sh

(today’s build) all my issues are solved (also the overload resolution) - thanks a lot!