Snapshot of Rdataframe creates two identical trees with different cycle number


Please read tips for efficient and successful posting and posting code

_ROOT Version:6.20
Platform: mac
Compiler: default

Here is my code.

   //read from original tree
   ROOT::RDataFrame d("BaselineTree;52", "mc16a_diboson.root");
   //add a column
   auto square = [](float_t x){return x*x;};
   auto df_2 = d.Define("mll_square_df",square,{"mll"});
   df_2.Snapshot("newtree_dframe","new_branch.root",{"mll","mll_square_df"});

In the file “new_branch.root”, it contains “newtree_dframe;1” and “newtree_dframe;2”, which occupies twice the space!

How did you assert this?

I have used a for loop to do it. The file created using dataframe is twice the size of the one created using for loop.

Hi @kai_zheng,
and welcome to the ROOT forum!
As you probably know, different cycle numbers indicate different versions of the TTree metadata: ROOT might write the metadata multiple times over the course of a large TTree write, e.g. as a measure to prevent full data loss in case of application crash. The highest cycle is the only one that “counts”, lower cycles can be ignored, and indeed getting “newtree_dframe” from the file should automatically retrieve “newtree_dframe;2”. In other words: multiple cycle numbers are expected when writing large TTrees and should be harmless. Data is written only once.

So we need to diagnose what’s going on in your case: what is the output of rootls -t <file.root> for the three files (the one produced with a for loop, the one produced with Snapshot, and the input file mc16a_diboson.root)? What is the compression ratio of the output files (you can check e.g. the first few lines printed by root -l -b -q -e 'TFile("<file.root>").Get("<tree>")->Print()'). Do the output trees produced with the two different methods contain the same number of entries and the same branches?

If these checks do not clarify what’s going on, we might need a self-contained reproducer that we can debug on our side.

Cheers,
Enrico

Thanks so much for your reply. I think I understand what is going on. It seems like I have saved some additional information. The two methods should be of the same size.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.