RDataFrame Snapshot converting all std::vector to RVec

Dear ROOT experts,

Ever since updating ROOT to the newest v6.26, when I try to create a Snapshot of a processed RDataFrame, all columns are converted from std::vector<float/double/int/char> to RVec<float/double/int/char>. I found in another topic that this is the expected behavior in the new ROOT versions.

While this is no doubt good when working with only RDataFrame, it breaks compatibility with other software expecting std::vectors in the TTree. In our case, running the HistFitter statistical framework (https://github.com/histfitter/histfitter) with the validated ROOT version v6.22, results in a ton of “Error in TExMap::Add: key 1 is not unique” error messages and an incorrect output.

Is there a way to prevent the vector → RVec conversion when storing the Snapshot? I tried manually redefining the columns back to RVec with

    auto df_upd = df.Redefine("colName", [](const RVecI &v) -> std::vector<Int_t> { 
        return std::vector<Int_t>(v.data(), v.data() + v.size());
    }, {"colName"});

but it doesn’t work; the stored branches still have the incorrect type.

Cheers,

Jean Yves

ROOT Version: 6.26
Platform: lxplus CentOS 7

Hi @Jean_Beaucamp ,

sorry for the trouble! Although in v6.26 RVecs are written as RVecs, they are written in such a way (using collection proxies) that you can actually read them back as std::vectors – but given your report I guess that mechanism is broken if you write them with v6.26 and read them back with v6.22, which is indeed not a configuration we test.

One way to force Snapshot to write std::vectors in this case is to manually specifying types and names for all branches that are written out (which I realize is clunky – it’s a workaround, not a fix).
So, for example, Snapshot<double, double, std::vector<float>>("t", "f.root", {"x", "y", "vec"});.

However, your workaround should also work! The following code will save column x as a vector<int> with the Redefine and as a RVecI without:

#include <ROOT/RDataFrame.hxx>

int main() {
  ROOT::RDataFrame(10)
      .Define("x",
              [] {
                return ROOT::RVecI{1, 2, 3};
              })
      .Redefine("x",
                [](const ROOT::RVecI &v) {
                  return std::vector<int>(v.begin(), v.end());
                },
                {"x"})
      .Snapshot("t", "f.root");
}

Is this the case for you too? What am I doing differently?

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.