Missing TStreamerInfo for branches of RVec

Hello!

When using multi-threading (ROOT::EnableImplicitMT), ROOT files written using RDataFrame don’t have TStreamerInfo for RVec branches. It becomes problematic when reading RVec branches using uproot in python. It is okay to read such branches using ROOT from my experience.

Single-threaded

When using single-thread, everything is okay for me. A ROOT file has a decent TStreamerInfo for RVec branch like vector<float,ROOT::Detail::VecOps::RAdoptAllocator<float> >.

import ROOT
df = ROOT.RDataFrame(100)
element_type_list = [
    "bool",
    "char",
    "unsigned char",
    "short",
    "unsigned short",
    "int",
    "unsigned int",
    "long",
    "unsigned long",
    "long long",
    "unsigned long long",
    "float",
    "double",
]
for element_type in element_type_list:
    name = "rvec_" + element_type.replace(" ", "_")
    expr = f"RVec<{element_type}>(5, 0)"
    df = df.Define(name, expr)
df.Snapshot("test", "/tmp/test_single.root")

root_file = ROOT.TFile("/tmp/test_single.root")
for each in root_file.GetStreamerInfoList():
    print(each.GetName())
TNamed
TObject
TList
TSeqCollection
TCollection
vector<bool>
vector<char,ROOT::Detail::VecOps::RAdoptAllocator<char> >
vector<unsigned char,ROOT::Detail::VecOps::RAdoptAllocator<unsigned char> >
vector<short,ROOT::Detail::VecOps::RAdoptAllocator<short> >
vector<unsigned short,ROOT::Detail::VecOps::RAdoptAllocator<unsigned short> >
vector<int,ROOT::Detail::VecOps::RAdoptAllocator<int> >
vector<unsigned int,ROOT::Detail::VecOps::RAdoptAllocator<unsigned int> >
vector<long,ROOT::Detail::VecOps::RAdoptAllocator<long> >
vector<unsigned long,ROOT::Detail::VecOps::RAdoptAllocator<unsigned long> >
vector<Long64_t,ROOT::Detail::VecOps::RAdoptAllocator<Long64_t> >
vector<ULong64_t,ROOT::Detail::VecOps::RAdoptAllocator<ULong64_t> >
vector<float,ROOT::Detail::VecOps::RAdoptAllocator<float> >
vector<double,ROOT::Detail::VecOps::RAdoptAllocator<double> >
TTree
TAttLine
TAttFill
TAttMarker
ROOT::TIOFeatures
TBranchElement
TBranch
TLeafElement
TLeaf
TString
TBranchRef
TRefTable
TObjArray
listOfRules

Multi-threaded

When ROOT::EnableImplicitMT is called with any number of threads, TStreamerInfo for RVec is missing.

import ROOT
ROOT.EnableImplicitMT(1) # 2, 3, or 0
df = ROOT.RDataFrame(100)
element_type_list = [
    "bool",
    "char",
    "unsigned char",
    "short",
    "unsigned short",
    "int",
    "unsigned int",
    "long",
    "unsigned long",
    "long long",
    "unsigned long long",
    "float",
    "double",
]
for element_type in element_type_list:
    name = "rvec_" + element_type.replace(" ", "_")
    expr = f"RVec<{element_type}>(5, 0)"
    df = df.Define(name, expr)

root_file = ROOT.TFile("/tmp/test_multi.root")
for each in root_file.GetStreamerInfoList():
    print(each.GetName())
TNamed
TObject
TList
TSeqCollection
TCollection
TTree
TAttLine
TAttFill
TAttMarker
ROOT::TIOFeatures
TBranchElement
TBranch
TLeafElement
TLeaf
TString
TBranchRef
TRefTable
TObjArray
TArrayD
TArray
TArrayI
listOfRules

Cheers,
Seungjin


ROOT Version: v6.22.02
Platform: CentOS Linux release 7.8.2003 (Core)
Compiler: gcc (GCC) 4.8.5
Python: Python 3.6.8


Hello @seungjin.yang,
thank you for the thorough report including a reproducer.
This looks like a bug in TBufferMerger, which RDataFrame uses internally for multi-thread file writes. It is now reported as https://github.com/root-project/root/issues/6611 .

Best regards,
Enrico

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.