There is currently no way to overwrite existing output using Snapshot
and one ends up with multiple cycles:
In [10]: f.hi.ls()
TDirectoryFile* hi hi
KEY: TTree there;2 there
KEY: TTree there;1 there
I have made a pull request (PR4965) adding this feature. Reproducer follows.
In [2]: df = ROOT.RDataFrame(10).Define('e', 'rdfentry_')
df.
In [3]: df.Snapshot('hi/there', 'test.root')
Out[3]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fc3a7011730>
In [4]: df2 = ROOT.RDataFrame(10).Define('e', 'rdfentry_ * 10')
In [5]: snapopts = ROOT.RDF.RSnapshotOptions()
In [6]: snapopts.fMode = 'UPDATE'
In [7]: df2.Snapshot('hi/there', 'test.root', '', snapopts)
Error in <TFile::mkdir>: An object with name hi exists already
Out[7]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fc3a3df8150>
In [8]: f = ROOT.TFile.Open('test.root')
In [9]: f.ls()
TFile** test.root
TFile* test.root
KEY: TDirectoryFile hi;1 hi
In [10]: f.hi.ls()
TDirectoryFile* hi hi
KEY: TTree there;2 there
KEY: TTree there;1 there
after fix:
In [1]: import ROOT
In [2]: df = ROOT.RDataFrame(10).Define('e', 'rdfentry_')
In [3]: df.Snapshot('hi/there', 'test.root')
Out[3]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fa041f1d1b0>
In [5]: df2 = ROOT.RDataFrame(10).Define('e', 'rdfentry_ * 10')
In [7]: snapopts = ROOT.RDF.RSnapshotOptions()
In [8]: snapopts.fMode = 'UPDATE'
In [9]: snapopts.fOverwrite = True
In [10]: df2.Snapshot('hi/there', 'test.root', '', snapopts)
Error in <TFile::mkdir>: An object with name hi exists already
Out[10]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fa04102be70>
In [11]: f = ROOT.TFile.Open('test.root')
In [12]: f.ls()
TFile** test.root
TFile* test.root
KEY: TDirectoryFile hi;1 hi
In [13]: f.hi.ls()
TDirectoryFile* hi hi
KEY: TTree there;1 there
ROOT Version: master
Platform: macOS
Compiler: Not Provided