RDataframe lazy operation and object lifetime

karuboniru · November 27, 2023, 2:38pm

For example, with code like this

#include <ROOT/RDF/RInterface.hxx>
#include <ROOT/RDataFrame.hxx>
#include <TRandom.h>

auto book_snapshot_action(ROOT::RDF::RNode df) {
  auto df1 = df.Define("y", []() { return gRandom->Gaus(); }, {});
  ROOT::RDF::RSnapshotOptions opts;
  opts.fLazy = true;
  auto snapshot = df1.Snapshot("treeout", "test.root", {"x", "y"}, opts);
  auto hist2d =
      df1.Histo2D({"hist2d", "hist2d", 100, -5, 5, 100, -5, 5}, "x", "y");
  return std::make_tuple(hist2d, snapshot);
}

int main() {
  ROOT::RDataFrame df(10000);
  auto defined = df.Define("x", []() { return gRandom->Gaus(); }, {});
  auto &&[hist2d, snapshot] = book_snapshot_action(defined);
  hist2d->SaveAs("hist.root");
  return 0;
}

I am booking 2 actions in book_snapshot_action and triggering them from main by dereferencing hist2d.

Those actions are booked on dataframe object df1, whose lifetime should be limited to the scope of book_snapshot_action. On the time when the lazy operation is being triggered, df1 should have been destoried. Or, things are booked in root node of the whole dataframe graph that survived though the operation.

And, is it safe to do things like:

#include <ROOT/RDF/RInterface.hxx>
#include <ROOT/RDataFrame.hxx>
#include <TRandom.h>

auto book_snapshot_action() {
  auto df = ROOT::RDataFrame{10000}
                .Define("x", []() { return gRandom->Gaus(); }, {})
                .Define("y", []() { return gRandom->Gaus(); }, {});
  ROOT::RDF::RSnapshotOptions opts;
  opts.fLazy = true;
  auto snapshot = df.Snapshot("treeout", "test.root", {"x", "y"}, opts);
  auto hist2d =
      df.Histo2D({"hist2d", "hist2d", 100, -5, 5, 100, -5, 5}, "x", "y");
  return std::make_tuple(hist2d, snapshot);
}

int main() {
  auto &&[hist2d, snapshot] = book_snapshot_action();
  hist2d->SaveAs("hist.root");
  return 0;
}

i.e. destory all RDataFrame objects I am holding before actually triggering the actions.

Both code seems to be working for me, but I am worried if there is any UB behind those code.

bellenot · November 28, 2023, 8:19am

Maybe @eguiraud or @vpadulan can give their thoughts on this

system · December 12, 2023, 8:19am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

vpadulan · December 12, 2023, 9:38am

Dear @karuboniru ,

Thank you for reaching out on the forum! And please accept my apologies for the very late reply. The behaviour you are seeing is correct and expected. All nodes of the computation graph created by RDataFrame have shared ownership of their parent nodes. So it is safe to define functions like you are doing in your example snippets.

Cheers,
Vincenzo