Is there a way to save dataframe to an already existing root file?

Hi,

I’m trying to store filtered data to an already existing root file. I am using:

    c.Snapshot("tree", "rootFile");	

But what it does actually is overwriting the “rootFile” with a new file that only has “tree” because they have the same name. While what I want is to save new leaf in a “tree” of already existing “rootFile”. Would that be possible?

Thanks!

Hi @Karl007 ,
yes, you can pass options to Snapshot, and one of the options changes the file opening mode.

ROOT::RDF::RSnapshotOptions opts;
opts.fMode = "update";
c.Snapshot("t", "f.root", "", opts);

The third argument, "" is just an empty list of columns to Snapshot, which by default means “all of them”.

Cheers,
Enrico

Hi @eguiraud
I tried it using the tree and root file names I want to update, but it says that the tree name already exists. But this I already know and my intention is to update that tree with a new leaf.

I also tried a different tree name (to save it as a new tree) but I get segmentation violation.

We must be doing something different. This works for me:

#include <ROOT/RDataFrame.hxx>

int main() {
   auto df = ROOT::RDataFrame(10).Define("x", [] { return 42; });

   // produce initial file
   df.Snapshot<int>("t1", "f.root", {"x"});

   // write another tree to that file
   ROOT::RDF::RSnapshotOptions opts1;
   opts1.fMode = "update";
   df.Snapshot<int>("t2", "f.root", {"x"}, opts1);

   // overwrite the t1
   ROOT::RDF::RSnapshotOptions opts2;
   opts2.fMode = "update";
   opts2.fOverwriteIfExists = true;
   df.Range(5).Snapshot<int>("t1", "f.root", {"x"}, opts2);

   return 0;
}

At the end of the program f.root contains t1 with 5 entries and t2 with 10. What am I doing differently?

Cheers,
Enrico

1 Like

The tree was already saved, this is why I received segmentation violation error.

Thanks Enrico!

Hi @eguiraud

Sorry I have another question: Is it possible to save it inside the same tree? or maybe copy the produced leaf into a specific tree?

No, that’s not possible with RDataFrame, but you can save the new leaf/branch in a separate TTree and then read this second tree together with the first as a single broader tree using TTree::AddFriend, see ROOT: ROOT::RDataFrame Class Reference

I am using:

	TFile *f = TFile::Open("rootfile");
	TTree *t = f.Get<TTree>("tree");
	TTree *ft = f.Get<TTree>("friendtree"); //friend tree
	 
	t.AddFriend(&ft, "myFriend");
	
	RDataFrame d(t);
	auto c = d.Define("h1", "friendtreeleaf[treeleaf == 3]");
	auto h = c.Histo1D("h1");
	h->DrawCopy();

But it cannot recognize AddFriend().

EDIT: tried this

	ROOT::RDataFrame t("tree", "rootfile");
	ROOT::RDataFrame ft("friendtree", "rootfile");
	t.AddFriend(&ft, "myFriend");
	
	RDataFrame d(t);
	auto c = d.Define("h1", "friendtreeleaf[treeleaf == 3]");
	auto h = c.Histo1D("h1");
	h->DrawCopy();

And it seems to still not recognize AddFriend(). Maybe my ROOT version?

Hi,
C++ shenanigans, I guess. The compilation errors should point to what the problem is.
In the first version:

t.AddFriend(&ft, "myFriend");

should be

t.AddFriend(ft, "myFriend");

because ft is already a pointer and AddFriend requires a pointer, and RDataFrame d(t) should be RDataFrame d(*t) because RDataFrame takes a reference and t is a pointer.

Cheers,
Enrico