How to stop RDataFrame from changing column type from std::vector<double> to ROOT::RVec?

Dear ROOT experts,

I have the following problem and can’t find a quick solution for it.
I generate a root file contain simulation data in a tree with many branches. All branches are of the type vector. Only one has type vector.

In a next step I use .Redefine and .Snapshot to manipulate values of three of the columns which works perfectly fine and the sill are of the same type after “snapshotting” them:

auto n = rdf.Redefine(“hit_x”,shifting_x,{“hit_x”,“hit_y”}).Redefine(“hit_y”,shifting_y,{“hit_x”,“hit_y”}).Redefine(“hit_z”,shifting_z,{“hit_z”}).Snapshot(“tree”,“…/out/file_modified.root”);

After that I have a new file which contains all the branches from the old tree, but only the three modified branches still are of the correct type. All the other columns, which are also of the type vector are changed to RVec. Normally having RVec would not be a problem, but the fitting script i am using can only use std::vector types. Is there an easy way of stopping this conversion from happening?

Otherwise I would go back to manually copy all 30 branches into the new file.

Cheers,
David

ROOT Version: 6.28/04


Welcome to the ROOt forum

I guess @vpadulan or @mczurylo can help you.

Dear @fritz_physi

Thanks for reaching out to the forum!

Indeed, this is a consequence of RDataFrame reads collections as the special type ROOT::RVec: as per the docs. For a JIT-ted call to Snapshot (i.e. the easier to type Snapshot("mytree","myfile")), the names of the types of all the columns have to be inferred. And by default RDataFrame infers a column of type std::vector to RVec. Currently, the only way to work around this is to fully specify which column types and names you want to save, e.g.

Snapshot<std::vector<float>, std::vector<float>>("t","f.root",{"a","b"})

I acknowledge that for 30 column names this is too cumbersome to spell out manually. Also, this request seems reasonable as it is only keeping the invariance of the TTree types already present. I think we can change this behaviour, maybe with an option you can specify to Snapshot as

RSnapshotOptions opts;
opts.vector2rvec = false;
auto colnames = df.GetColumnNames();
Snapshot("t","f.root", colnames, opts) // This will not convert std::vector<int> to RVec<int>

How does it sound? Would you be available for testing such an option?

Cheers,
Vincenzo

Dear @vpadulan,

Thank you very much for your quick response. At the moment I manually copy all the branches, which is as you said tedious but works of course. I am happy to test such an option with my code, if it would be implemented. Let me know and I can try it out :slight_smile:

Cheers,
David