Dear ROOT experts,
I’m using RDataFrame with MultiThreading for my analysis for TMVA scoring with ROOT.computeModel, but I found out that RDataFrame mixes events in a new file after processing.
I’m using two categories of files: first is original data and second is BDT scoring output files. My final goal is to use AddFriend with them. Unfortunately, I discovered that:
- AddFriend is not possible for the files processed with DRF due to error. It affects both Multithreading and 1 thread modes.
Error in <AddFriend>: Tree 'microtree' has the kEntriesReshuffled bit set,
and cannot be used as friend nor can be added as a friend unless the main
tree has a TTreeIndex on the friend tree 'microtree'. You can also unset the bit manually
if you know what you are doing.
- When I define reshuffled bit for RDF output file to be false - it doesn’t fix the problem. Events in the output file are already mixed by RDF and I can see that the order of events in the BDT file doesn’t match event’s order in the primary files.
Could you please let me know how we can solve this problem?
Should I use TTreeIndex? If yes, could you please point me at the example with RDF?
Below you can find a simple code which reproduces the problem using RDF.
Best regards, Grigorii.
ROOT Version: 6.22/03
Platform: macOS
Compiler: clang
import ROOT
import array
def ftree(a,b):
filename = 'tree_with_event_range' +'_%s_%s.root'%(a,b)
f = ROOT.TFile(filename,"recreate");
tree = ROOT.TTree("tree", "test")
eventNumber = array.array('i', [0])
tree.Branch("eventNumber", eventNumber, "eventNumber/I")
for i in range(a,b):
eventNumber[0] = i
tree.Fill()
tree.Write()
f.Close()
return filename
name1000 = ftree(0,1000)
name1000_1100 = ftree(1000,1100)
name1100_2100 = ftree(1100,2100)
treeChain = ROOT.TChain('tree')
treeChain.Add(name1000)
treeChain.Add(name1000_1100)
treeChain.Add(name1100_2100)
ROOT.ROOT.EnableImplicitMT(3)
rdtest = ROOT.RDataFrame(treeChain)
rdtest = rdtest.Snapshot('tree','rdfile.root','eventNumber')
ROOT.ROOT.DisableImplicitMT()
treeRDF = ROOT.TChain('tree')
treeRDF.Add('rdfile.root')
expEventNumber = 0
for event in treeRDF:
if expEventNumber != event.eventNumber:
print('Error: Expected event number is %s, but we get %s'%(expEventNumber, event.eventNumber))
expEventNumber = expEventNumber+1