Direct way to watch progress of Snapshot of RDataFrame

Hi, I create new tree but not a histogram with some tree(s) using RDataFrame.
I create new branches with like df.Define("new_values", calc, {"old_values"} then df.Snapshot("new_tree", "new.root") (df is ROOT::RDataFrame)

During this, I would like to watch the progress. I found a way in the tutorial and example to stand on a Histogram like this,

auto h = df.Histo1D("new_values");
h.OnPartialResult(100, [&](TH1D &h_) {cout << h_.GetEntries() << endl;});
*h; // run loop here in my understanding

df.Snapshot("new_tree", "new.root")

This treatment works, but it is a little bit tricky. I might think that the overhead is fully negligible.
This h is not related to our processing. When I process another tree, I will need to change the “new_values” for an existing branch. If I remove *h, Moreover, the event loop will be executed in *h, not Snapshot (in my understanding).

May I have a better direct way?

Hi @mzks ,

you can call OnPartialResult on (almost) any RDF result, so a cheaper way is to just do a Count rather than a Histo1D (although the cost of filling an extra histogram should indeed be small w.r.t. the Snapshot).

Your code above runs two event loops, the first one on *h (because Histo1D is a lazy action and you access its result), the second on Snapshot, which is an instant action (unless you pass an option parameter to make it lazy) and therefore runs the event loop right where it’s called.

I think you want something like this:

df.Count().OnPartialResult(/*every */100/* events*/,
                           [](auto c) { std::cout << c << '\n'; });
df.Snapshot("new_tree", "new.root");

Cheers,
Enrico

P.S.

we are adding a progress bar feature to RDF itself soon (it will be available in the next ROOT release).

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.