RDataFrame: Snapshot does not work properly after Define and Range call

Dear experts,

I am currently having problems saving a RDataframe after calling Range() on it. Most variables are empty (at least everything that is a std::vector),

What I do is

// create dataframe
ROOT::RDataFrame df(tree_name,"some_rootfile.root");
// define variables needed for cuts and/or in the output
df=df.Define("leptmp", "return Construct<ROOT::Math::PtEtaPhiEVector>(lep_pt/1000., lep_eta, lep_phi, lep_E/1000.);"); // new RDataFrame to save the variable
df=df.Define("lep_theta", "return ROOT::VecOps::Map(leptmp,[](ROOT::Math::PtEtaPhiEVector x){return x.Theta();})");
// Filter events
df = df.Filter("All(lep_pt > 27000. && lep_z0*sin(lep_theta) < 0.5)");
// get a second dataframe with just 50000 entries
auto df_example = df.Range(0,50000.);
// save both dataframes
df_example.Snapshot(tree_name, "some_example_rootfile.root");
df.Snapshot(tree_name, "some_output_rootfile.root");

For the dataframe with all entries everything works fine, all variables are stored in the output file. For the example file, the variables are there but all std::vector ones are empty. What am I doing wrong or what am I missing here?

As this is a rather old code, I am using ROOT version 6.26/04.

Thanks in advance!

Hi,

Thanks for the interesting post.
This is not expected. Could you try out with a recent 6.30 release? Could you also, if that fails, share with us the input file so that we can reproduce?

Cheers,
D

Hi Danilo,

I tried it now with ROOT version 6.30/02, but it still does not work. I have sent you a direct message, since I was not able to upload the input file here (probably too large). Here is the link to my cernbox: CERNBox

Cheers

Dear @eneb,

I have tried your example but I re-wrote it a bit so that it can be compiled and reproduced easily and standalone:

#include <ROOT/RDataFrame.hxx>
#include <ROOT/RVec.hxx>
#include <Math/Vector4D.h>
#include <TInterpreter.h>

int main(){

    gInterpreter->GenerateDictionary("ROOT::RVec<ROOT::Math::PtEtaPhiEVector>", "Math/Vector4D.h;ROOT/RVec.hxx");

    // create dataframe
    ROOT::RDataFrame df("mini","mc_410000.ttbar_lep.root");
    // define variables needed for cuts and/or in the output
    auto df1 = df.Define("leptmp", "ROOT::VecOps::Construct<ROOT::Math::PtEtaPhiEVector>(0.001*lep_pt, lep_eta, lep_phi, 0.001*lep_E);"); // new RDataFrame to save the variable
    auto df2 = df1.Define("lep_theta", "return ROOT::VecOps::Map(leptmp,[](ROOT::Math::PtEtaPhiEVector x){return x.Theta();})");
    auto df_norange = df2.Filter("All(lep_pt > 27000. && lep_z0*sin(lep_theta) < 0.5)");

    auto df_range = df_norange.Range(0,50.);
    // save both dataframes
    auto snapshot_df_ranges = df_range.Snapshot("outputTree", "output_ranges.root");
    auto snapshot_df_no_ranges = df_norange.Snapshot("outputTree", "output_no_ranges.root");

    return 0;
}

The problem has nothing to do with the call to Ranges(). The main issue in your example is the missing the dictionary, with my debug version of ROOT what I see as an error is:

Error in <TTree::Branch>: The class requested (ROOT::VecOps::RVec<ROOT::Math::LorentzVector<ROOT::Math::PtEtaPhiE4D<double> > >) for the branch "leptmp" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (ROOT::VecOps::RVec<ROOT::Math::LorentzVector<ROOT::Math::PtEtaPhiE4D<double> > >) to avoid to write corrupted data.

hence I added the line above:

gInterpreter->GenerateDictionary("ROOT::RVec<ROOT::Math::PtEtaPhiEVector>", "Math/Vector4D.h;ROOT/RVec.hxx");

You can also follow a similar example here: Not being able to write RVec<PtEtaPhiMVector> object into root file with RDF snapshot

Cheers,
Marta

Hello Marta,

Thanks for the help! I already found the thread you linked before and tried to implement the solutions there for my case, but it did not work.

Unfortunately, your solution, including the line

gInterpreter->GenerateDictionary("ROOT::RVec<ROOT::Math::PtEtaPhiEVector>", "Math/Vector4D.h;ROOT/RVec.hxx");

did not work as well. I still have empty variables in the output where I use Range(). (In the other output file everything is still complete, as before.).

Is it important where the files are stored, which are generated by GenerateDictionary? Because I run the my script from outside the directory where the executable is located. So, the files generated from GenerateDictionary are located where I run the script.

Is there a chance for me to run a debugging version of ROOT to check if it gives some error messages I otherwise do not get?

I tested your solution with the ROOT 6.26/04 and ROOT 6.30/02.

Thanks a lot!