How to fill list into branch with python

dm-leo · July 14, 2023, 5:30am

Dear all:
I am using pyROOT to generate a tree to fill the list because my data were processing by python. I want to fill the float number and list to the tree with different branches (One list and one number were contained in one entry). It’s easy to fill the number but I don’t know how to fill the list. Here is my code about filling the list. So could someone show some suggestions to me?

        Ori = []#a list with lots of sublist, for example, [[1.2,2.3],[3.4,4,5],[6,7.8,8,9,10]]. 
        ORI = std.vector('vector<float>')()#According to some examples, list need to be converted into vector. 
        for echo in range(len(Ori)):
                sublist = std.vector('vector<float>')(len(Ori[echo]))
                for i in range(len(Ori[echo])):
                        sublist.push_back(Ori[echo][i])

                ORI.push_back(sublist)
                sublist.clear()
        br2 = tree.Branch('ORI',ORI)
        for alice in range(len(Ori)):
                sublist = std.vector('double')(len(Ori[alice]))
                for i, value in enumerate(Ori[alice]):
                        sublist[i] = value
                ORI[0]=sublist

                tree.Fill()

_ROOT Version: 6.26/06
_Platform:MacOS
_Compiler:Apple clang version 14.0.0 (clang-1400.0.29.202)

vpadulan · July 14, 2023, 10:27am

Dear @dm-leo ,

Here is an example of how to do that. The idea is that you start by establishing the TTree dataset schema, i.e. with Branch using an std::vector for the data type. Then at each event you need to replace the contents of the vector with the contents of your array at that specific event:

import ROOT

myl = [
    [1,2,3],
    [4],
    [5,6,7,8],
    [],
    [9,10]
]

with ROOT.TFile.Open("file_vec.root", "recreate") as f:
    tree = ROOT.TTree("events", "events")

    # Establish the dataset schema
    # Create an std::vector where we will insert the values of the array
    # at each event
    vec = ROOT.std.vector[float]()
    tree.Branch("vec", vec)

    for l in myl:
        # Clear the contents of the vector
        vec.clear()
        # Replace the contents in the vector with the contents
        # from the current array
        vec.reserve(len(l))
        for v in l:
            vec.push_back(v)

        tree.Fill()
    f.WriteObject(tree, "events")

# Visualize the contents with RDataFrame
df = ROOT.RDataFrame("events","file_vec.root")
print(df.Describe())
df.Display().Print()

dm-leo · July 17, 2023, 2:09am

Hi, Padulano
Thanks a lot for your example and it works now. But I still not clear about filling the tree with python. Why the array must be used in python but only float variable is enough in C++? And elements of an array could be lists, why it’s necessary to convert them into vector?

Kindly regards
dm

vpadulan · July 17, 2023, 7:47am

Dear @dm-leo,

And elements of an array could be lists, why it’s necessary to convert them into vector?

Yes indeed, no need to convert the elements from the myl list into anything, you just need to insert the values of the array at the current event into the vector so that the branch will be filled. Any input iterable will do. Sorry for the confusion, I modified my snippet above.

Why the array must be used in python but only float variable is enough in C++?

I am not sure I understand this question, could you maybe make an example?

Cheers,
Vincenzo

dm-leo · July 18, 2023, 3:08am

Hello Vincenzo,
An example like root:pyroot_ttree [CMS Wiki Pages]. In C++, double, float could be used but in python, why an array must be used?

vpadulan · July 18, 2023, 7:32am

Dear @dm-leo ,

Thanks for the example, I understand what you mean now.

The reason why in Python you need to go through the extra level of indirection given by the standard array library or numpy.array even just for a branch of simple types like integers and floats is that you need the memory address of the current value in order to properly connect it to the tree branch (i.e. that’s what you do via Branch("name", address) or via SetBranchAddress). Imagine this pseudo-equivalent code to having a float in C++, in Python

tree = ROOT.TTree("tree_name","tree title")
# What's happening here? You can already see that
# this is not like declaring a C++ `float`, you need to
# give this a value just for the variable to exist.
px = 0.
tree.Branch("px",px,"px/F") # Here we're passing the address of the current px variable

for _ in range(N):
    # This is a completely different variable in Python!
    # So it means it will have a different address in memory
    px = 42.f 
    # What will this do? The previous address is not pointing
    # to the current value in the loop
    tree.Fill()

Instead, by using something like array, you can create the array at a certain memory location, and then modify its contents in-place, thus keeping the same variable address for the TTree to properly connect it to the branch. I hope the explanation is clear.

Cheers,
Vincenzo

dm-leo · July 18, 2023, 7:49am

Hello Vincenzo,
Thanks for your reply. I am clear now! Thanks again.

Kindly regards
dm

system · August 1, 2023, 7:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.