Error writing trees with Snapshot and ImplicitMT

If I try to create a Snapshot with ImplicitMT enabled using a tree name that includes a directory, I get the error

Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already

multiple times, implying it attempts to create the directory multiple times.

I’m not sure why this is, as I thought each thread called mkdir on its own file (see here), but I went ahead and created a pull request that solves this problem by allowing mkdir to return an existing directory instead of raising an error.

Details below.


ROOT Version: master
Platform: macOS
Compiler: Not Provided


Before my fix:


I can make a Snapshot just fine without multithreading:

In [1]: import ROOT

In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')

In [3]: df.Snapshot('testdir/testtree', 'test.root')
Out[3]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7ffac5fcda00>

or:

In [1]: import ROOT

In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')

In [3]: ROOT.ROOT.EnableImplicitMT()

In [4]: df.Snapshot('testdir/testtree', 'test.root')
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7f99317b2cf0>

But if I enable multithreading, I get errors:

In [1]: import ROOT

In [2]: ROOT.ROOT.EnableImplicitMT()

In [3]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')

In [4]: df.Snapshot('testdir/testtree', 'test.root')
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Error in <ROOT::Experimental::TBufferMergerFile::mkdir>: An object with name testdir exists already
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fdfee302690>

after my fix:


It still works fine without multithreading:

In [1]: import ROOT

In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')

In [3]: df.Snapshot('testdir/testtree', 'test.root')
Out[3]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fcdc5e8b380>

or

In [1]: import ROOT

In [2]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')

In [3]: ROOT.ROOT.EnableImplicitMT()

In [4]: df.Snapshot('testdir/testtree', 'test.root')
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fe6ba492640>

and still raises no errors if I have multithreading enabled:

In [1]: import ROOT

In [2]: ROOT.ROOT.EnableImplicitMT()

In [3]: df = ROOT.RDataFrame(10000).Define('e', 'rdfentry_')

In [4]: df.Snapshot('testdir/testtree', 'test.root')
Out[4]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7ffe6a3e7f50>
2 Likes

Thanks to @pcanal the PR was merged and this is fixed.

Indeed, thanks much to the ROOT team for your prompt response.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.