RDataFrame catching undeclared identifier error

My dataset contains certain events with missing variables (columns in RDataFrame).

Using PyROOT, I want to create a Filter such that for events missing this variable I can catch this error and create the missing column with some user-defined value.

When the Filter encounters an event with the missing column, python gives the exception:

TypeError: can not resolve method template call for 'Filter'

which by itself is not very helpful.

However, I see that RDF already prints an error message:

input_line_43:1:46: error: use of undeclared identifier 'GenModel_YMass_125'

which identifies the missing column name.

My question is: How can I retrieve the name of the column which RDF already identifies instead of writing my own function with pattern matching etc.?

At the moment the Filter method I use is https://root.cern/doc/master/classROOT_1_1RDF_1_1RInterface.html#af415d0a369aaa449492563f47a13fd37
with a simple C++ expression of the form

GenModel_YMass_125==1

but our use case also contains more complicated forms.

To be specific, I would like PyROOT to get the error message (catch exception ), store the
undeclared identifier ‘GenModel_YMass_125’ in a variable (say ‘missing_column’) and call Define(missing_column, user_speficied_value).

Note, that the HasColumn check is cumbersome since I have to identify the column name from the filter expression first. This is what I am trying to avoid.


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.20/04
Platform: lxplus7
Compiler: Not Provided
Python Version: 2.7.5


Hi @devdatta,
the part about PyROOT hiding the exception message is a known issue with PyROOT, https://sft.its.cern.ch/jira/browse/ROOT-8439. Feel free to ping us there, But I must say that in general relying on parsing exception messages for control flow is not very robust :sweat_smile:

Another problem is that error: use of undeclared identifier 'GenModel_YMass_125' is not really part of the C++ exception message, but it’s an error printed by cling (ROOT’s C++ just-in-time compiler) when it tries to compile the expression on the fly – this is a consequence of the fact that RDF did not recognize GenModel_YMass_125 as a column, so the C++ code generated is broken. In other words: RDataFrame cannot tell you that some part of the C++ expression you passed is not a valid C++ identifier, one needs a C++ compiler for that. However, parsing compiler errors is even more fragile than parsing expression messages…

Maybe setting a custom error handler could help intercepting these kind of cling errors, I can’t think of anything else, sorry :sweat:

Cheers,
Enrico

Hi @eguiraud Thanks for the reply. I was just trying to be economical and recycle some existing code if possible. |n our code we are using Clang to parse C++ functions. I guess we could do the same with these expressions and evaluate them before filtering? We’re not trying to parse error messages but to evaluate the expressions for validity before an exception is thrown.

Let’s say you have the string "GenModel_YMass_125 == 1" in hand – if you just pass it to gInterpreter->ProcessLine or similar, it will just complain that GenModel_YMass_125 is not a valid identifier, because cling/gInterpreter does not know about a variable called like that.

In principle (and also in practice, it shouldn’t be that complicated), you can declare variables for all the dataset columns in a namespace and then try to evaluate those expressions in that same namespace. Haven’t tried the code, but it should give you an idea:

std::string all_variables = "namespace ttree_vars {\n";
for (const auto &c : df.GetColumnNames()) {
  if (c.find('.') != std::string::npos)
    continue; // column name contains a dot, can't declare a variable for it -- notable limitation
  all_variables += df.GetColumnType(c) + ' ' + c + ";\n";
}
all_variables += "}";
gInterpreter->Declare(all_variables.c_str());

// now try whether an expression compiles
EErrorCode err;
gInterpreter->Calc(expression, &err);
if (err ...)
 ....

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.