ImplictMT changes Snapshot behavior for TTree with 0 entries

If I take a snapshot of an empty RDataFrame, ImplicitMT results in an empty output file. If ImplicitMT is not enabled, however, I end up with an output file that stores an empty TTree. Reproducers follow.


Enabling ImplicitMT results in an empty file:

import ROOT as r
r.ROOT.EnableImplicitMT()  # turn on multithreading
rdf = r.RDataFrame(10).Define('e', 'rdfentry_').Filter('e > 20')  # create 0 entry data frame                                                                                                            
s = rdf.Snapshot('test', 'test.root')
f = r.TFile.Open('test.root')
f.ls()
# TFile**         test.root       test.root
#  TFile*         test.root       test.root

Disabling ImplicitMT results in an empty tree:

r.ROOT.DisableImplicitMT()  # turn off multithreading
rdf = r.RDataFrame(10).Define('e', 'rdfentry_').Filter('e > 20')  # create 0 entry data frame
s = rdf.Snapshot('test', 'test.root')
f = r.TFile.Open('test.root')
f.ls()
# TFile**         test.root
#  TFile*         test.root
#   OBJ: TTree    test    test : 0 at: 0x5565946dda70
#   KEY: TTree    test;1  test
t = f.Get('test')
t.Print()
# ******************************************************************************
# *Tree    :test      : test                                                   *
# *Entries :        0 : Total =             317 bytes  File  Size =        179 *
# *        :          : Tree compression factor =   1.00                       *
# ******************************************************************************

If the RDataFrame is not empty, there are other storage differences. Specifically, the file size, total size, and number of baskets differ:

r.ROOT.EnableImplicitMT()
rdf = r.RDataFrame(10).Define('e', 'rdfentry_')
s = rdf.Snapshot('test', 'test.root')
f = r.TFile.Open('test.root')
t = f.Get('test')
t.Print()
# ******************************************************************************
# *Tree    :test      : test                                                   *
# *Entries :       10 : Total =            1594 bytes  File  Size =       1153 *
# *        :          : Tree compression factor =   1.00                       *
# ******************************************************************************
# *Br    0 :e         : e/l                                                    *
# *Entries :       10 : Total  Size=       1257 bytes  File Size  =        760 *
# *Baskets :       10 : Basket Size=      32000 bytes  Compression=   1.00     *
# *............................................................................*
r.ROOT.DisableImplicitMT()
rdf = r.RDataFrame(10).Define('e', 'rdfentry_')
s = rdf.Snapshot('test', 'test.root')
f = r.TFile.Open('test.root')                                                                                                                                                 
t = f.Get('test')
t.Print()
# ******************************************************************************
# *Tree    :test      : test                                                   *
# *Entries :       10 : Total =             962 bytes  File  Size =        457 *
# *        :          : Tree compression factor =   1.35                       *
# ******************************************************************************
# *Br    0 :e         : e/l                                                    *
# *Entries :       10 : Total  Size=        625 bytes  File Size  =        110 *
# *Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.35     *
# *............................................................................*

ROOT Version: 6.22/00
Platform: linux
Compiler: conda-forge


Hi @mwilkins,
indeed, I think that has always been the case. I believe the behavior with implicit MT disabled (that results in an empty TTree) is to be preferred, but it is not straightforward to mimic in the multi-thread case due to how parallel TFile writing works internally and the asynchronous nature of multi-thread TFile writing. If you think it is an important improvement to have, can you please open a jira ticket with your report?

That is to be expected as the way the file is produced is (very) different.

Cheers,
Enrico

It is not a feature that is important to my current use case, but inconsistent results with and without ImplicitMT enabled can lead to confusion. I’ve created a low priority ticket. Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.