Greetings,
I am trying to convert a pandas dataframe with some string type columns to a RDataframe and save it to file.
I tried to follow the workaround (adding ROOT.TPython.Class()
before Define()) presented in the topic RDataframe with strings:
import pandas as pd
import ROOT
if __name__ == "__main__":
# load the pickled pandas file
pandas_df = pd.read_pickle("pandas_df_example.pckl")
# format the columns to pass them to ROOT
pandas_df.columns = ["_".join(map(str, col)) for col in pandas_df.columns]
pandas_df.columns = pandas_df.columns.astype(str)
# check column types
# print(pandas_df.head())
# print(pandas_df.dtypes)
# extract the string-type columns to add them later
string_cols = pandas_df.select_dtypes("string").columns
numerical_cols_df = pandas_df.drop(string_cols, axis=1)
# import the numerical columns to RDataFrame
data_rdf = ROOT.RDF.FromPandas(numerical_cols_df)
# add back the string-type columns with Define
ROOT.TPython.Class()
for str_df_col in string_cols:
data_rdf = data_rdf.Define(
f"{str_df_col}",
f'auto to_eval = "pandas_df[\'{str_df_col}\'][" + std::to_string(rdfentry_) + "]"; return (std::string) TPython::Eval(to_eval.c_str());',
)
# save the dataframe to file
data_rdf.Snapshot("test_tree","Test_file.root")
For some reason, though, I still get a similar error to the one that the workaround was supposed to address:
input_line_83:74:12: error: expected member name or ';' after declaration specifiers
TPyReturn isascii() {
~~~~~~~~~ ^
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:26: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_83:74:12: error: expected ')'
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:28: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_83:74:12: note: to match this '('
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:23: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_83:74:12: error: expected ')'
TPyReturn isascii() {
^
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:37: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_83:74:12: note: to match this '('
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:22: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
RDataFrame::Run: event loop was interrupted
input_line_88:74:12: error: expected member name or ';' after declaration specifiers
TPyReturn isascii() {
~~~~~~~~~ ^
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:26: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_88:74:12: error: expected ')'
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:28: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_88:74:12: note: to match this '('
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:23: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_88:74:12: error: expected ')'
TPyReturn isascii() {
^
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:37: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
input_line_88:74:12: note: to match this '('
/usr/include/ctype.h:225:22: note: expanded from macro 'isascii'
# define isascii(c) __isascii (c)
^
/usr/include/ctype.h:99:22: note: expanded from macro '__isascii'
#define __isascii(c) (((c) & ~0x7f) == 0) /* If C is a 7 bit value. */
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:680:30: error: no member named '_M_start' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
~~~~~~~~~~~~~ ^
input_line_88:6:23: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::~vector' requested here
std::vector<TPyArg> v; v.reserve(0);
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:86:57: error: no member named '_M_start' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
_GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(this->_M_impl._M_start),
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_iterator.h:2470:41: note: expanded from macro '_GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR'
std::__make_move_if_noexcept_iterator(_Iter)
^~~~~
input_line_88:6:28: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::reserve' requested here
std::vector<TPyArg> v; v.reserve(0);
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:88:36: error: no member named '_M_start' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:92:32: error: no member named '_M_start' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
_M_deallocate(this->_M_impl._M_start,
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:95:18: error: no member named '_M_start' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
this->_M_impl._M_start = __tmp;
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:96:18: error: no member named '_M_finish' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
this->_M_impl._M_finish = __tmp + __old_size;
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:97:18: error: no member named '_M_end_of_storage' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
~~~~~~~~~~~~~ ^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:281:22: error: no member named '_M_impl' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >'
{ return this->_M_impl; }
~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:924:28: note: in instantiation of member function 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_M_get_Tp_allocator' requested here
{ return _S_max_size(_M_get_Tp_allocator()); }
^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:69:23: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::max_size' requested here
if (__n > this->max_size())
^
input_line_88:6:28: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::reserve' requested here
std::vector<TPyArg> v; v.reserve(0);
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:999:40: error: no member named '_M_end_of_storage' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
{ return size_type(this->_M_impl._M_end_of_storage
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:71:17: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::capacity' requested here
if (this->capacity() < __n)
^
input_line_88:6:28: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::reserve' requested here
std::vector<TPyArg> v; v.reserve(0);
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:919:40: error: no member named '_M_finish' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
{ return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:73:33: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::size' requested here
const size_type __old_size = size();
^
input_line_88:6:28: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::reserve' requested here
std::vector<TPyArg> v; v.reserve(0);
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:112:20: error: no member named '_M_finish' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:1204:9: note: in instantiation of function template specialization 'std::vector<TPyArg, std::allocator<TPyArg> >::emplace_back<TPyArg>' requested here
{ emplace_back(std::move(__x)); }
^
input_line_88:11:5: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::push_back' requested here
v.push_back(fPyObject);
^
In module 'std' imported from input_line_1:1:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:830:39: error: no member named '_M_finish' in 'std::_Vector_base<TPyArg, std::allocator<TPyArg> >::_Vector_impl'
{ return iterator(this->_M_impl._M_finish); }
~~~~~~~~~~~~~ ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:1146:11: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::end' requested here
return *(end() - 1);
^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/vector.tcc:123:9: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::back' requested here
return back();
^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_vector.h:1204:9: note: in instantiation of function template specialization 'std::vector<TPyArg, std::allocator<TPyArg> >::emplace_back<TPyArg>' requested here
{ emplace_back(std::move(__x)); }
^
input_line_88:11:5: note: in instantiation of member function 'std::vector<TPyArg, std::allocator<TPyArg> >::push_back' requested here
v.push_back(fPyObject);
^
RDataFrame::Run: event loop was interrupted
Traceback (most recent call last):
File "/storage/gpfs_data/neutrino/users/alrugger/Software/DarkNews/DarkNews-generator/examples/root_mr_example.py", line 32, in <module>
data_rdf.Snapshot("test_tree","Test_file.root")
cppyy.gbl.std.logic_error: Template method resolution failed:
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, string_view columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
logic_error: basic_string::_M_construct null not valid
ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(string_view treename, string_view filename, string_view columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = RSnapshotOptions()) =>
logic_error: basic_string::_M_construct null not valid
A strange aspect is that the same code runs with no issues on my local machine (ROOT: v6.32.08, Python: 3.9.6, MacOSX15.1 with Intel chip and Apple clang-16.0.0 compiler).
What could be the problem with my code or setup?
Any feedback that you may have would be greatly appreciated.
I’m attaching the python script and a shortened pandas file below.
root_mr_example.py (1.1 KB)
pandas_df_example.pckl.zip (162.8 KB)
ROOT Version: v6.32.06
Python version: 3.9.18
Platform: AlmaLinux 9.4
Compiler: linuxx8664gcc