Direct way to watch progress of Snapshot of RDataFrame

mzks · August 10, 2022, 1:23pm

Hi, I create new tree but not a histogram with some tree(s) using RDataFrame.
I create new branches with like df.Define("new_values", calc, {"old_values"} then df.Snapshot("new_tree", "new.root") (df is ROOT::RDataFrame)

During this, I would like to watch the progress. I found a way in the tutorial and example to stand on a Histogram like this,

auto h = df.Histo1D("new_values");
h.OnPartialResult(100, [&](TH1D &h_) {cout << h_.GetEntries() << endl;});
*h; // run loop here in my understanding

df.Snapshot("new_tree", "new.root")

This treatment works, but it is a little bit tricky. I might think that the overhead is fully negligible.
This h is not related to our processing. When I process another tree, I will need to change the “new_values” for an existing branch. If I remove *h, Moreover, the event loop will be executed in *h, not Snapshot (in my understanding).

May I have a better direct way?

eguiraud · August 11, 2022, 5:58pm

Hi @mzks ,

you can call OnPartialResult on (almost) any RDF result, so a cheaper way is to just do a Count rather than a Histo1D (although the cost of filling an extra histogram should indeed be small w.r.t. the Snapshot).

Your code above runs two event loops, the first one on *h (because Histo1D is a lazy action and you access its result), the second on Snapshot, which is an instant action (unless you pass an option parameter to make it lazy) and therefore runs the event loop right where it’s called.

I think you want something like this:

df.Count().OnPartialResult(/*every */100/* events*/,
                           [](auto c) { std::cout << c << '\n'; });
df.Snapshot("new_tree", "new.root");

Cheers,
Enrico

eguiraud · August 12, 2022, 9:11am

P.S.

we are adding a progress bar feature to RDF itself soon (it will be available in the next ROOT release).

system · August 26, 2022, 9:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.