JIT compiler cannot deal with illegal C++ names

In my tree I have a couple of branches whose names are not legal C++ variable names, such as muplus_cpt_0.10 (and I don’t have any influence on this naming scheme).

When using the new TDataFrame in C++, this is no problem, since I can e.g. do:

auto isolation = [](double cpt_0_5, double cpt_0_1) { return cpt_0_5 - cpt_0_1; };
auto selected = df.Define("muplus_isolation", isolation, {"muplus_cpt_0.50", "muplus_cpt_0.10"});

However, when using PyROOT, this does not work (no way to do a lambda, or is there?), because I have to use the pure string interface (which JIT compiles the string to be faster), but this fails:

>>> selected = df.Define("muplus_isolation", "muplus_cpt_0.50 - muplus_cpt_0.10")
input_line_81:2:20: error: expected ';' after top level declarator
double muplus_cpt_0.10;
input_line_81:3:8: error: redefinition of 'muplus_cpt_0'
double muplus_cpt_0.50;
input_line_81:2:8: note: previous definition is here
double muplus_cpt_0.10;
input_line_81:3:20: error: expected ';' after top level declarator
double muplus_cpt_0.50;
Traceback (most recent call last):
  File "reduce_df.py", line 64, in <module>
  File "reduce_df.py", line 36, in reduce_df
    .Define("muplus_isolation", "muplus_cpt_0.50 - muplus_cpt_0.10") \
Exception: ROOT::Experimental::TDF::TInterface<ROOT::Detail::TDF::TCustomColumnBase> ROOT::Experimental::TDF::TInterface<ROOT::Detail::TDF::TFilterBase>::Define(experimental::basic_string_view<char,char_traits<char> > name, experimental::basic_string_view<char,char_traits<char> > expression) =>
    Cannot declare these variables  namespace __tdf_7 {
double muplus_cpt_0.10;
double muplus_cpt_0.50;
Interpreter error code is 1. (C++ exception of type runtime_error)

we do not support branch names that are not valid C++ variable names in JITed strings; it would be hard to deal with: given “a.b < 10” we would have a hard time deciding whether “a.b” is the branch name or you are accessing the b member of branch a. Furthermore, the string must be valid C++ to be parsed, so the variables should have valid names.

If the size of your dataset allows it, you could use TDataFrame from C++ to rename the branches:

df.Define("newname", [](double b) { return b; }).Snapshot("newtree", "newfile.root");

and then do some interactive analysis in python. This incurs in an extra copy of the branch value, but I see your branches are doubles so it should not be an issue. Another option is setting an alias for the branch name with TTree::SetAlias (I have not tried). Let me know if these are viable options. The last resort of course is just sticking with C++ for analyzing these kind of branches.

In the future we might think of adding an Alias transformation to deal with these cases, but it is not planned for the near future I am afraid.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.