PyROOT: Converting branch of a vector to a vector of branches

ROOT Version: 6.24/02
Platform: Linux

Dear experts,
I have a tree that contains among others two branches of which one is a vector and the other is a vector. Both vectors have the same size and the entries are paired (i.e. the string at index 0 belongs to the float at index 0 of the other vector).
I’m trying to create a new tree which is a clone of the old one without the two vector branches (I managed to do this successfully) but instead add a new float branch with the name of the string of vector1 and the value of vector2. The problem here is, that first the vectors are very large (like up to 700 entries) and the size is not fixed (e.g. once it’s 700 entries and for another use case it might be 500 entries or even just 20). Furthermore, the trees itself have many entries >300 millions. This has the consequence that I can’t create a new variable for each entry but need to use some sort of list/array/vector/dict/loop to store the variables. I already tried several approaches but none seems to fill the tree with the right values:

  • Approach 1:
        tmp_tree.SetBranchStatus("weightSystematic*",0)
        #clone tree
        tree =tmp_tree.CloneTree(0)
        #reset branch status to have access
        tmp_tree.SetBranchStatus("*",1)

        tmp_tree.GetEntry(0)
        varvalue=[]
        for ivar in xrange(0,len(tmp_tree.weightSystematicName)):
            var=str(tmp_tree.weightSystematicName[ivar])
            varvalue.append(array('f',[0]))
            tree.Branch(var,varvalue[ivar],var+"/F")
        i=0
        for i in xrange(1,tmp_tree.GetEntries()):
            for ivar in xrange(0,len(tmp_tree.weightSystematicName)):
                varvalue[ivar] = array('f',[tmp_tree.weightSystematicValue[ivar]])
            tree.Fill()
            tmp_tree.GetEntry(i)

This gives me the correct branches but the values are kind of arbitrary, either it’s some value that could correspond to another branch or it’s 0 or extremely large (values that aren’t present in the tree but I assume come from some floating point stuff. I also tried with replacing the list by a dictionary with the same result.

  • Approach 2:
        tmp_tree.SetBranchStatus("weightSystematic*",0)
        #clone tree
        tree =tmp_tree.CloneTree(0)
        #reset branch status to have access
        tmp_tree.SetBranchStatus("*",1)

        tmp_tree.GetEntry(0)
        for ivar in xrange(0,len(tmp_tree.weightSystematicName)):
            var=str(tmp_tree.weightSystematicName[ivar])
            value = array('f',[tmp_tree.weightSystematicValue[ivar]])
            newbranch=tree.Branch(var,value,var+"/F")
            for i in xrange(tmp_tree.GetEntries()):
                if ivar==0: tree.Fill()
                tmp_tree.GetEntry(i)
                newbranch.Fill()

This only results in the last value to be filled properly, while the others are just very large values.

  • Approach 3:
      for i in xrange(tmp_tree.GetEntries()):
            tmp_tree.GetEntry(i)
            tree.Fill()
            for ivar in xrange(0,len(tmp_tree.weightSystematicName)):
                var=str(tmp_tree.weightSystematicName[ivar])
                value = array('f',[tmp_tree.weightSystematicValue[ivar]])
                if i==0:
                    newbranch=tree.Branch(var,value,var+"/F")
                else:
                    newbranch=tree.GetBranch(var)
                newbranch.Fill()

Very similar to approach 2 but with the order changed. And the result is also the same.

I really hope that you can maybe provide some guidance how to solve this. I looked through the forum but haven’t found a similar problem. If doing this is only possible in C++, I can also convert my code from PyROOT to “standard” Root, but I would prefer a solution in python.

Thank you very much in advance,
Kira

Hello,

Just to make sure I understand correctly, in the output tree, you would like to create one new branch per string that is in the vector branch of the input tree? So if the input tree has two rows (entries), each row with a vector of 100 strings, you will create 200 new branches in the output tree?

Yes and no. Yes, I want to have a branch for each string in the vector, but the vector has the same strings for every entry in the event. For your example that would mean, that I would like to end up with 100 new branches each containing 2 entries.

Ok, then what I would do is:

  1. Do GetEntry(0) in the input tree and get the string vector branch from the tree, that will give you all the names you need.
  2. Create as many array.arrays as branches you need to create. In your examples, the array.arrays are garbage collected after the iteration where they are created, but they need to survive and stay constant for the whole filling process (they provide the address where Fill will fill).
  3. Do as many Branch calls as array.array you created, each call with a different array.array.
  4. Do a loop over the entries of the input tree. For each entry, get the second vector branch and update all array.arrays, each with one value of the second vector, then call Fill on the output tree.

Thank you for your reply. I have a few clarification questions, because that’s what I actually tried in approach 1 (more or less).

  1. This part worked fine for me
  2. Now this is the tricky part. Do you mean I need to create them each with a fixed name like a1=array.array(‘f’,[0]), a2=array.array(‘f’,[0]), …, a670=array.array(‘f’,[0])? Because I don’t now the size of my vector ad hoc (or rather it varies from tree to tree that I need to do this for). I tried pushing the array’s to a list or dictionary but it appears the addresses get broken then.
  3. Same as for 2. do I really need to do tree.Branch(string1,a1,string1+’/F’) individually for all strings/arrays? That removes the complete flexibility of the code with respect of differently sized vectors.
  4. Using a list of arrays, point 1.-3. worked out perfectly fine it’s only the filling in the loop over the tree entries that didn’t make any sense.

Hi,

If I understand correctly, once you get the names from the string vector branch you know how many branches you will create, right?

Once you know how many branches you will create, you know how many arrays you need (you can have a loop where you create them and put them in a list) and you also know how many times you have to invoke Branch with the corresponding arrays. It’s dynamic in the sense that you get the info from the tree you are processing, then just loop as many times as you need for that tree.

Hi,

yes. That’s exactly what I tried and what didn’t work. What I have now is:

tmp_tree.GetEntry(0)
varvalue=[]
for ivar in xrange(0,len(tmp_tree.weightSystematicName)):
    var=str(tmp_tree.weightSystematicName[ivar])
    varvalue.append(array('f',[0]))
    tree.Branch(var,varvalue[ivar],var+"/F")
for entry in tmp_tree:
    for ivar in xrange(0,len(tmp_tree.weightSystematicName)):
        varvalue[ivar] = array('f',[entry.weightSystematicValue[ivar]])
    tree.Fill()
    print entry.weightSystematicName[0],entry.weightSystematicValue[0],varvalue[0], tree.FAKEBKG_STAT_VAR0__1down
    print entry.weightSystematicName[1],entry.weightSystematicValue[0],varvalue[0], tree.FAKEBKG_STAT_VAR0__1up
    print entry.weightSystematicName[2],entry.weightSystematicValue[0],varvalue[0], tree.FAKEBKG_STAT_VAR100__1down
    print entry.weightSystematicName[3],entry.weightSystematicValue[0],varvalue[0], tree.FAKEBKG_STAT_VAR100__1up
    print entry.weightSystematicName[-4],entry.weightSystematicValue[-4],varvalue[-4], tree.FAKEBKG_SYST_VAR7__1down
    print entry.weightSystematicName[-3],entry.weightSystematicValue[-3],varvalue[-3], tree.FAKEBKG_SYST_VAR7__1up
    print entry.weightSystematicName[-2],entry.weightSystematicValue[-2],varvalue[-2], tree.FAKEBKG_SYST_VAR8__1down
    print entry.weightSystematicName[-1],entry.weightSystematicValue[-1],varvalue[-1], tree.FAKEBKG_SYST_VAR8__1up

where the print out reads:

FAKEBKG_STAT_VAR0__1down 0.72755664587 array('f', [0.7275566458702087]) 0.722459495068
FAKEBKG_STAT_VAR0__1up 0.72755664587 array('f', [0.7275566458702087]) 0.722459495068
FAKEBKG_STAT_VAR100__1down 0.72755664587 array('f', [0.7275566458702087]) 0.722459495068
FAKEBKG_STAT_VAR100__1up 0.72755664587 array('f', [0.7275566458702087]) 0.722459495068
FAKEBKG_SYST_VAR7__1down 0.696251571178 array('f', [0.6962515711784363]) 0.722459495068
FAKEBKG_SYST_VAR7__1up 0.748667359352 array('f', [0.7486673593521118]) 0.722459495068
FAKEBKG_SYST_VAR8__1down 0.444918990135 array('f', [0.44491899013519287]) -4.13170767074e-36
FAKEBKG_SYST_VAR8__1up 1.0 array('f', [1.0]) 9.80908925027e-45

You can see that apparently all values in the tree are the ones from the first array accept for the last elements in the list where it becomes basically 0 for no reason.
EDIT: The most important part is also that the first two values are identical which is not always the case for the last two values so something must go wrong in the filling of the tree (this is just an example for one event, but it’s the same for other events) Also the order of strings in the vector is always the same so there is also no mixup between different variables.

I see, this line is not right:

        varvalue[ivar] = array('f',[entry.weightSystematicValue[ivar]])

Notice how you are rebinding varvalue[ivar] to a new array and garbage collecting the old one, for each array you associated with a branch before. What you need to do is change the array you already have, i.e. varvalue[ivar][0] = entry.weightSystematicValue[ivar].

Thank you so much. That fixed the issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.