Hi,
as per the docs, Foreach is an “instant action” that triggers the event loop on the spot, and does not return anything (i.e. Foreach returns void and you can’t call Snapshot on void).
instead, which is more explicit about the fact that there are two event loops being executed, since both Foreach and Snapshot are instant actions.
To run everything in one event loop, you can make Snapshot lazy:
ROOT::RDF::RSnapshotOptions opts;
opts.fLazy = true;
d.Snapshot("new_data", "new_file.root", {"var1"}, opts); // event loop not run here
d.Foreach(lambda_function, {"var1", "var2", ...}); // event loop always run on a Foreach
RDataFrame actions return (smart pointers to) results, and the event loop is triggered upon first access to any of the results. Foreach does not produce any result, so there is no natural trigger for the event loop other than the call to the method itself. We could have had Foreach return a dummy result object that contains nothing, but we felt it would have been more awkward than the actual solution. Arguable decision, admittedly, but since it’s easy to make things work as you want (just make everything else lazy and call Foreach last) I don’t think it’s a big issue.
but this will fail because Foreach() does not have access to the previous transformation. If, instead, Snapshot() is called after Foreach(), the former will be the one without access to “var”.
It is possible to chain some “lazy” transformations, or else maybe a better approach exists?