Some problem with lazy snapshots in RDataFrame

Dear all,
I think I have a problem with the lazy Snapshots in RDataFrame.
I’m trying to divide my TChain with all events in 3 TTrees using RDataFrame and filters. I book 3 Snaphots and trigger event loop with Count().GetValue(). But I’m getting 3 warnings:
Warning in : A lazy Snapshot action was booked but never triggered.
And no files are created. If set opts.fLazy = false, all 3 files are created and everything seems all right, but it triggers 3 event loops, which is not good for big data volumes.

Printed result of d.Count().GetValue() is good, and RDataFrame successfully opens my files with TTrees - I check it by printing the first column name.
What is wrong with my code?
Here is part of my code where I do this. I also attached a simple reproducer of what I’m doing. reproducer.C (871 Bytes)

Thank you for your help!

    //lambdas for filters are created here
    ROOT::RDataFrame d("data",dataframename);
    ROOT::RDF::RSnapshotOptions opts;
    opts.fLazy = true;
    std::string filename = "SF.root";
    std::string treename = "SF";
    d.Filter(sf_filter,{"energy_front_sum","energy_back_sum", "energy_side_sum", "energy_front_side_sum", "camera1"}).
            Snapshot(treename,filename,d.GetColumnNames(),opts);
    filename = "ER.root";
    treename = "ER";
    d.Filter(er_filter,{"energy_front_sum","energy_back_sum", "energy_side_sum", "energy_front_side_sum", "camera1"}).
            Snapshot(treename,filename,d.GetColumnNames(),opts);
    filename = "alpha.root";
    treename = "alpha";
    d.Filter(alpha_filter,{"energy_front_sum","energy_back_sum", "energy_side_sum", "energy_front_side_sum", "camera1"}).
            Snapshot(treename,filename,d.GetColumnNames(),opts);
    std::cout << d.Count().GetValue() << std::endl; 

ROOT Version: 6.24/02
Platform: Linux Mint
Compiler: gcc


Hi @dima5135 ,
ah, laziness and lifetimes :confused: I agree this is confusing – you need to keep the result pointer around or RDF will forget that Snapshot you booked:

auto tmp1 = d.Filter(sf_filter,{"energy_front_sum","energy_back_sum", "energy_side_sum", "energy_front_side_sum", "camera1"}).
            Snapshot(treename,filename,d.GetColumnNames(),opts);

Does that help?
We might want to change this behavior…it’s a performance optimization for other actions: if all result pointers go out of scope the user does not have any way to access the results so we can just not produce them! But for Snapshot it makes less sense.

2 Likes

Yes, that helped! Thank you!
I think it will be enough if you write this in lazy Snapshot description :slight_smile:
Thanks again for the help!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.