If I try to create a Snapshot
with ImplicitMT
enabled using a tree name that includes a directory, I get the error
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
multiple times, implying it attempts to create the directory multiple times.
I’m not sure why this is, as I thought each thread called mkdir
on its own file (see here), but I went ahead and created a pull request that solves this problem by allowing mkdir
to return an existing directory instead of raising an error.
Details below.
ROOT Version: master
Platform: macOS
Compiler: Not Provided
Before my fix:
I can make a Snapshot
just fine without multithreading:
In [1]: import ROOT
In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')
In [3]: df.Snapshot('testdir/testtree', 'test.root')
Out[3]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7ffac5fcda00>
or:
In [1]: import ROOT
In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')
In [3]: ROOT.ROOT.EnableImplicitMT()
In [4]: df.Snapshot('testdir/testtree', 'test.root')
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7f99317b2cf0>
But if I enable multithreading, I get errors:
In [1]: import ROOT
In [2]: ROOT.ROOT.EnableImplicitMT()
In [3]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')
In [4]: df.Snapshot('testdir/testtree', 'test.root')
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fdfee302690>
after my fix:
It still works fine without multithreading:
In [1]: import ROOT
In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')
In [3]: df.Snapshot('testdir/testtree', 'test.root')
Out[3]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fcdc5e8b380>
or
In [1]: import ROOT
In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')
In [3]: ROOT.ROOT.EnableImplicitMT()
In [4]: df.Snapshot('testdir/testtree', 'test.root')
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fe6ba492640>
and still raises no errors if I have multithreading enabled:
In [1]: import ROOT
In [2]: ROOT.ROOT.EnableImplicitMT()
In [3]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')
In [4]: df.Snapshot('testdir/testtree', 'test.root')
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7ffe6a3e7f50>