Hi, I create new tree but not a histogram with some tree(s) using RDataFrame.
I create new branches with like df.Define("new_values", calc, {"old_values"} then df.Snapshot("new_tree", "new.root") (df is ROOT::RDataFrame)
During this, I would like to watch the progress. I found a way in the tutorial and example to stand on a Histogram like this,
auto h = df.Histo1D("new_values");
h.OnPartialResult(100, [&](TH1D &h_) {cout << h_.GetEntries() << endl;});
*h; // run loop here in my understanding
df.Snapshot("new_tree", "new.root")
This treatment works, but it is a little bit tricky. I might think that the overhead is fully negligible.
This h is not related to our processing. When I process another tree, I will need to change the “new_values” for an existing branch. If I remove *h, Moreover, the event loop will be executed in *h, not Snapshot (in my understanding).
you can call OnPartialResult on (almost) any RDF result, so a cheaper way is to just do a Count rather than a Histo1D (although the cost of filling an extra histogram should indeed be small w.r.t. the Snapshot).
Your code above runs two event loops, the first one on *h (because Histo1D is a lazy action and you access its result), the second on Snapshot, which is an instant action (unless you pass an option parameter to make it lazy) and therefore runs the event loop right where it’s called.
I think you want something like this:
df.Count().OnPartialResult(/*every */100/* events*/,
[](auto c) { std::cout << c << '\n'; });
df.Snapshot("new_tree", "new.root");