Error with RDataFrame for TChain loaded with TTrees with different names

Hello,

I’m trying to use RDataFrame by using a TChain loaded with TTrees with different names in different files but it seems that when I set ROOT to EnableImplicitMT(), it seems to only read the first tree of the TChain. Here is an example to replicate the issue.

import ROOT 
ROOT.gROOT.SetBatch()
ROOT.ROOT.EnableImplicitMT()

df1 = ROOT.ROOT.RDataFrame(10).Define("x", "5")
df1.Snapshot("tree1", "f1.root")

df2 = ROOT.ROOT.RDataFrame(5).Define("x", "5")
df2.Snapshot("tree2", "f2.root")

tc1 = ROOT.TChain()
tc1.Add("f1.root/tree1")
tc1.Add("f2.root/tree2")

df_tc1 = ROOT.ROOT.RDataFrame(tc1)

count1 = df_tc1.Count()
print "count1 = %d" %(count1.GetValue())

tc2 = ROOT.TChain()
tc2.Add("f2.root/tree2")
tc2.Add("f1.root/tree1")

df_tc2 = ROOT.ROOT.RDataFrame(tc2)

count2 = df_tc2.Count()
print "count2 = %d" %(count2.GetValue())

will result in the following output:

Error in <TTreeProcessorMT::Process>: An error occurred while getting tree tree1 from file f2.root: skipping this file.
count1 = 10
Error in <TTreeProcessorMT::Process>: An error occurred while getting tree tree2 from file f1.root: skipping this file.
count2 = 5

If I remove ROOT.ROOT.EnableImplicitMT(), the output looks what is expected:

count1 = 15
count2 = 15

Kind regards,
Fikri


ROOT Version: 6.20.04 (from CVMFS on lxplus7)


It seems there is also a problem with loading different TTrees that are stored in the same file. It seems that RDataFrame would use the first TTree in the TChain repeatedly. There is no error message this time around. Here is an example to reproduce:

import ROOT 
import os 
ROOT.gROOT.SetBatch()
ROOT.ROOT.EnableImplicitMT()

df1 = ROOT.ROOT.RDataFrame(10).Define("x", "5")
df1.Snapshot("tree1", "f1.root")

df2 = ROOT.ROOT.RDataFrame(5).Define("x", "5")
df2.Snapshot("tree2", "f2.root")

df3 = ROOT.ROOT.RDataFrame(2).Define("x", "5")
df3.Snapshot("tree3", "f3.root")

os.system("hadd -f f.root f1.root f2.root f3.root")

tc1 = ROOT.TChain()
tc1.Add("f.root/tree1")
tc1.Add("f.root/tree2")
tc1.Add("f.root/tree3")

df_tc1 = ROOT.ROOT.RDataFrame(tc1)

count1 = df_tc1.Count()
print "count1 = %d" %(count1.GetValue())

tc2 = ROOT.TChain()
tc2.Add("f.root/tree2")
tc2.Add("f.root/tree1")
tc2.Add("f.root/tree3")

df_tc2 = ROOT.ROOT.RDataFrame(tc2)

count2 = df_tc2.Count()
print "count2 = %d" %(count2.GetValue())

The output is the following

hadd Target file: f.root
hadd compression setting for all output: 1
hadd Source file 1: f1.root
hadd Source file 2: f2.root
hadd Source file 3: f3.root
hadd Target path: f.root:/
count1 = 30
count2 = 15

Again, if ROOT.ROOT.EnableImplicitMT() is removed, the output is sensible:

hadd Target file: f.root
hadd compression setting for all output: 1
hadd Source file 1: f1.root
hadd Source file 2: f2.root
hadd Source file 3: f3.root
hadd Target path: f.root:/
count1 = 17
count2 = 17

Kind regards,
Fikri

Hi Fikri,
thank you for the report and for the self-contained reproducers! At a first glance, this looks like this bug, that was recently fixed in master (and will be fixed in the upcoming ROOT v6.22).

Could you please check both of your reproducers on a ROOT nightly build (e.g. from lxplus source /cvmfs/sft.cern.ch/lcg/views/dev3/latest/x86_64-centos7-gcc8-dbg/setup.sh)?

Cheers,
Enrico

Hi @eguiraud

Could you please check both of your reproducers on a ROOT nightly build (e.g. from lxplus source /cvmfs/sft.cern.ch/lcg/views/dev3/latest/x86_64-centos7-gcc8-dbg/setup.sh)?

I can confirm that both reproducers now work as expected with the nightly build. Thanks!

Kind regards,
Fikri

Hi,
just so you know, I backported the fix to v6.20, so it will also be present in the next 6.20 patch release (v6.20/06).

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.