@etejedor There is a bug in ROOT’s “csv” parsing. It improperly parses “DOS” encoded files. You can see it with the attached file, in the very last column name. It should be “Embarked” but ROOT improperly uses the final “carriage return” character as the last character of this name (this character does not appear in “Unix” encoded files, of course).
@marty1885 The problem is that the “Age” and “Cabin” columns are sometimes empty. ROOT cannot accept it. When creating this file, you would need to make sure that every entry gets some “default” value.
this is a known missing feature, relevant discussion + link to the jira ticket is here.
Unfortunately, at the moment we do not have free hands to implement support for missing values (me myself will not be at CERN until March) but PRs are of course welcome – or comments to the jira ticket explaining why the feature should float up the to do list.
The simplest workaround is to substitute missing values with some telltale value.
I’d ask @etejedor, the author of the RCSVDataSource, to point to the parts of the code that need upgrading to support missing values. There is also the question of what behavior we want RDF to have when there is a missing value in a CSV file.