Different behaviour of RDF::RResultPtr::GetValue python/C++

In python

RDF = ROOT.ROOT.RDataFrame 
d = RDF("tvec", "hvector.root")  
d.Histo1D("x").GetValue()

This results None. It works if I do h = d.Histo1D("x"); h.GetValue()

In C++

ROOT::RDataFrame df("tvec", "hvector.root")
df.Histo1D("x").GetValue()

This returns (const TH1D &) @0x557fc25b7890


_ROOT Version: 6.18/00
Platform: Not Provided
Compiler: Not Provided


Hi wiso,
python is protecting you from a use-after-delete/dangling reference, and the C++ snippet is (subtly) wrong.

GetValue returns a reference to the TH1D pointed by RResultPtr, and that reference becomes dangling (refers to an invalid memory location) when the RResultPtr goes out of scope (similar semantics to shared_ptr and shared_ptr::get()).

Depending on your use-case, you can either always use the RResultPtr, possibly passing res_ptr.GetValue() to functions that require a TH1D&, keep the RResultPtr alive together with the TH1D reference, or call res_ptr->Clone() instead of res_ptr.GetValue() to get a standalone copy of the histogram the lifetime of which is not tied to the RResultPtr.

In general, to avoid these kind of issues, you can think of RResultPtr<T> as a std::shared_ptr<T>.

Cheers,
Enrico

Are you saying that RResultPtr has the ownership of the TH1D? Why not passing it to the dataframe?

RResultPtr and the dataframe share ownership of the TH1D during the event loop, and the dataframe releases that ownership at the end of the event loop. That’s an implementation detail though: from the user’s perspective, the only way to access the result TH1D is through the RResultPtr returned by Histo1D (which should be treated as a shared_ptr<TH1D> and can be passed around and copied just like a shared_ptr).

After the RResultPtr<TH1D> goes out of scope, the user has no way to retrieve the histogram anymore (and pointers or references to that histogram obtained from the RResultPtr are invalidated, just as with pointers and references obtained from shared_ptr::get and shared_ptr::operator* are invalidated when all shared_ptrs go out of scope).

If the question is why the lifetime of the TH1D is tied to the RResultPtr rather than to the dataframe: to avoid action at a distance between two entities (dataframe and histogram) that have no reason to be related once the event loop is over. A dataframe can be reused to produce multiple histograms, potentially with multiple event loops, and the user is responsible of managing the lifetime of the histograms (through the RResultPtrs, which have the “familiar” C++ behavior of shared_ptrs).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.