Creating friend trees from a tree shuffled in multithreaded RDataFrame

Dear ROOT experts,

I want to do the following

  1. Select events and compute new variables in a tree with RDataFrame using multithreading. These files are to be used in TMVA for the classifier training.
  2. Use singlethreaded RDataFrame to compute the classifier response and save it in another tree, that would later be used as a friend tree.

However, I’m experiencing issues even with the simple mock-up of this approach.
If I create the trees like this

import ROOT

def create_file(file_name, tree_name):
    df = ROOT.ROOT.RDataFrame(10).Define("x", "gRandom->Rndm()")
    df.Snapshot(tree_name, file_name, "")

def change_tree_add_branch(file_name, tree_name, branch_name, friend_file_name):
    df = ROOT.ROOT.RDataFrame(tree_name, file_name)
    df = df.Define(branch_name, "x")

    branch_vector = ROOT.vector('string')()
    branch_vector.push_back(branch_name)
    friend_tree_name = tree_name + '_friend'
    df.Snapshot(friend_tree_name, friend_file_name, branch_vector)

def main():
    file_name = "f_mt.root"
    friend_file_name = 'f_mt_friend.root'
    tree_name = "t1"
    ROOT.ROOT.EnableImplicitMT(10)
    create_file(file_name, tree_name)
    ROOT.ROOT.DisableImplicitMT()
    change_tree_add_branch(file_name, tree_name, "y", friend_file_name)

if __name__ == "__main__":
    main()

and then later try to read them

import ROOT

main_file_path = "f_mt.root"
friend_file_path = "f_mt_friend.root"
tree_name = "t1"
friend_tree_name = "t1_friend"

main_file = ROOT.TFile(main_file_path)
main_tree = main_file.Get(tree_name)
main_tree.AddFriend(friend_tree_name, friend_file_path)

for event in main_tree:
    print event.x, event.y

I get the following the error

Error in <AddFriend>: Tree 't1' has the kEntriesReshuffled bit set, and cannot be used as friend nor can be added as a friend unless the main tree has a TTreeIndex on the friend tree 't1_friend'. You can also unset the bit manually if you know what you are doing.
0.484973614337
Traceback (most recent call last):
  File "print_friend.py", line 24, in <module>
    print event.x, event.y
AttributeError: 'TTree' object has no attribute 'y'

I’ve checked and the main tree t1 has the kEntriesReshuffled bit set to True and the friend tree t1_friend has it set to False.

I’m confused as to why it is impossible to have those two trees as friends. I thought this bit would prevent adding t1 as a friend, not it being a main tree.

Is this the intended behavior? If yes, what are the ways around it other than manually setting the kEntriesReshuffled to False for the main trees?

Best regards,
Aleksandr


ROOT Version: 6.26/06
Platform: Ubuntu 20.04
Compiler: Precompiled


Hi @apetukho ,

yes that’s the intended behavior, we want to raise a flag also if the reshuffled tree is used as the main tree.

This message should really say “cannot be used as friend nor friends can be added to it”, I took note to improve it.

Currently the only way to say “I know what I’m doing, don’t worry” is to manually unset the bit. I think you can also unset the bit and overwrite the tree in the file so the bit is unset once and for all.

With RNTuple, TTree 2.0, we plan to have better provenance metadata so that we would be able to automatically see that t1 is reshuffled w.r.t. the original tree but t1_friend is “in the same order” as t1.

We could also think of more ergonomic ways to let expert users unset the bit.

Cheers,
Enrico

CC: @vpadulan @mczurylo @pcanal @jblomer @Axel

P.S.
I opened a PR to make the error message more clear

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.