Reading values from Trees

Akira · July 6, 2005, 11:33am

Hi, I am using root 4.03/02 with PYROOT.

I get my data in the form of Tree and Leaves and I am making histograms out of these. The easiest thing is to use Draw(“leaf>>histo”) or Project(“histo”,“leaf”) but this turned out not very stable through PyRoot. I get histograms filled on the command line of Python but not when I write scrips. Also I sometimes need to manipulate these values and for this I used something like

f=TFile("file_name")
tree=f.Get("tree_name")
tree.Draw("leaf_name","","goff")
arr=tree.GetV1()

But if I get another leaf in the same way, then “arr” will be overwitten with the new one too. So I always have to do

arrCopy=[]
for i in range(tree.GetSelectedRows()):
  arrCopy.append(arr[i])

This works but is slow and evetytime I try to retrieve something out of a Tree I need to make a copy, which is not efficient.

Could you tell me if there is a better way of doing this?[/code]

pcanal · July 6, 2005, 1:48pm

It strongly depends on what you are trying to do. For example if your goals is to compare to values from within the same event(s), you should consider using MakeClass/MakeSelector/MakeProxy. On the other hand if your goal is to sort the all the values of the same data member, then you have to first copy the values.

Cheers,
Philippe.

wlav · July 6, 2005, 6:38pm

Akira,

Could you please send me an example with which I can reproduce the “not very stable” part? If it’s a problem caused by PyROOT, it needs to be addressed and fixed.

Now that sounds scary. The only difference between CLI and scripts is the exception handler in the former, which allows for shell escapes etc. as a convenience (doing so in a script would make no sense, rather the python os and commands modules should be used directly). Other than that, I’m unaware of any differences and certainly, there shouldn’t be any: if there are, consider it a bug. Could you please provide me with an example with which I can reproduce the problem?

[quote]arrCopy=[] for i in range(tree.GetSelectedRows()): arrCopy.append(arr[i])[/quote]

Is it really necessary to copy the values (which will indeed be slow), rather than the references?

Well, what are you trying to do? Even in the old version of 4.03, most variables in a tree can be accessed directly by using the leaf name, and otherwise you can walk the tree by getting the branches and leaves explicitly (see also cern.ch/wlav/pyroot/tpytree.html for more details). Could you be more elaborate in your example?

Thanks,
Wim

Akira · July 8, 2005, 6:46pm

Hi Philippe hi Wim, Thanks a lot for your replies.
Wim, I know you are working in ATLAS so I will explain my situation more specifically.

I failed to reproduce the “not stable” bit. I found it a few months ago and then I didn’t try to investigate it further and just decided to copy the contents of the trees. However, I still see some inconsistencies. I attatch the file with my trees in. And the following seems to work.

from ROOT import *

f=TFile("MtBjj.test.ntuple.root")
tree=f.Get("ElectronNtuple")
histo=TH2F("histo","", 50,0,50000,50,0,2 )
histo2=TH2F("histo2","", 50,0,50000, 50,0,2) 
histo3=TH1F("histo3","", 50,0,50000)
histo4=TH1F("histo4","", 50,0,50000)
histo5=TH1F("histo5","", 5,0,5)
histo6=TH1F("histo6","", 5,0,5)
tree.Draw("(P/E):E>>histo")
tree.Project("histo2", "(P/E):E")
tree.Draw("E>>histo3")
tree.Project("histo4", "E")
tree.Draw("no>>histo5")
tree.Project("histo6", "no")

canv=TCanvas("canv", "", 400,450)
canv.Divide(2,3)
canv.cd(1)
histo.Draw("")
canv.cd(2)
histo2.Draw("")
canv.cd(3)
histo3.Draw("")
canv.cd(4)
histo4.Draw("")
canv.cd(5)
histo5.Draw("")
canv.cd(6)
histo6.Draw("")

canv.SaveAs("test.eps")

Although, this one doesn’t:

from ROOT import *
f=TFile("MtBjj.test.ntuple.root")
tree=f.Get("ElectronNtuple")
histo=TH1F("histo", "", 0,0,5)
tree.Project("histo","no")

canv=TCanvas("canv","", 200,150)
canv.cd(1)
histo.Draw("")
canv.SaveAs("test2.eps")

So why is this? It seems that even in the second one, I have some entries in the histogram but nothing is showing up.

I tried your link too but this does not seem to work on what I have. It is hard for me to see the correspondance with this and my file. Mine is from Athena, I use ntuple service to make this but not sure this is in the right kind of format. Your link seems to explain how to get trees out of a directory, while I want to get hold of the values in the Leaves.

Of course, I would love to do all the things instead of copying which is dreadful. But as I described above, with the method I know, I only get “buffer” which is not good enough since it will be replaced by another tree as soon as I retrieve another leaf.

I have to admit, though, I am not very sure if I am structuring my analysis right. What I decided to do is to get some variables out in the form of ntuple and manipulate them afterwards. This involves things like I want to accept the reconstructed top if there was exactly two light jets of energy bigger than 40GeV and two b-jets with likelihood bigger than 0.6 and the missing Et is bigger than 20GeV and so on and so on. Plus I want to be able to change these cuts as I wish withought running another 10-hour job.

All the functionalities provided with the Draw and Project methods in Trees are attractive, although I couldn’t quite figure out how I can impliment everything I want to do with them.

Methods structured around Trees and Leaves seem very hard to understand for me, I was expecting something like

aTree.GetLeaf("aLeaf").GetValue(1)

But nothing seems to work that way and I am not too sure how these things are designed to be used and I am confused.

Any suggestion is much appreciated. Thank you for your help.
MtBjj.test.ntuple.root (106 KB)

pcanal · July 8, 2005, 6:51pm

[quote]But nothing seems to work that way and I am not too sure how these things are designed to be used and I am confused. [/quote]I am just curious, did you get a chance to read the User’s Guide chapter on TTree? The main concept should be explained there.

Philippe.

wlav · July 8, 2005, 9:44pm

Akira,

until recently if you did ‘import ROOT’ in a standard Atlas setup, you’d get the SealROOT python ROOT module, not the PyROOT one. Which Atlas release are you working with? (The Rome releases 10.0.x are recent enough, the simulation production release 9.4.0 is not.)

As for your scripts, your codes do two totally different things, and I think they do the right thing. E.g. in your second example, you have:

i.e. you create a histogram with zero bins. That is, all your entries over/under flow: you have entries but nothing visible. Give it a couple of bins, e.g.:

histo=TH1F("histo", "", 5,0,5)

and the result looks just fine to me?

This I don’t understand. The buffer gets filled with the new set of values when you get a new event (it’s basically holding on to the pointer that ROOT uses to hold the array that it represents). Getting a new leaf doesn’t affect it at all. Or shouldn’t anyway. If you can produce a script that does that, though, it’s a bug.

Also, this:

works, even when accessed as “aTree.aLeaf[1]” (which is more pyhonistic). I did that e.g. for your TWeight and E arrays in ElectronNtuple. It won’t work with SealROOT though, i.e. if you run an older Atlas release with a wrong setup, because several parts of the TTree lib were not bound.

Finally, as for structuring your analysis, can I suggest something like:

[code]from ROOT import *
f=TFile(“MtBjj.test.ntuple.root”)
tree=f.Get(“ElectronNtuple”)

nEvent = 0
while ( tree.GetEvent( nEvent ) ):

access energy

  for i in range(0,int(tree.no)):
      print tree.E[i]
  print

etc …

nEvent += 1[/code]
where more experienced ROOT users can correct me on the best way of looping over a TTree.

HTH,
Wim

Akira · July 11, 2005, 1:04pm

Hi, Thank you very much for your reply.

Yes, the user guide has been more or less the only source of the information for me…

I am sorry for the confusion about my histogram not booked properly. So now I cannot reproduce the “not stable” bit, I will report if I see it somewhere again. I have been using the Athena version 10.01 and this shouldn’t be the problem here.

Wim, thank you for your tips. Much appreciated. I have never used the method of getting variables by getting certain events. I have been doing the following:

tree.Draw(leafN, "", "goff") 
temp=tree.GetV1()
index=self.deepCopy(temp, tree.GetSelectedRows())

where the deepCopy does

def deepCopy(self, toCopy, numEl):
  output=[]
  for i in range(numEl):
    output.append(toCopy[i])
  return tuple(output)

I make tuples because I don’t want to modify the values and I hoped immutable object may be more efficiently read, though when writing to it I observed that

output=()
output+=aValue

was much slower.

The reason why I have to do this “deepCopy” is that if I did

tree.Draw(leafName1, "", "goff") 
a=tree.GetV1()
tree.Draw(leafName2, "", "goff") 
b=tree.GetV1()

then a is overwritten by b and a and b points to the same thing, as in

>>> a
<read-write buffer ptr 0xb0f05008, size 2147483640 at 0xb757f600>
>>> b
<read-write buffer ptr 0xb0f05008, size 2147483640 at 0xb3acfc20>

I wish this was not the case but I just had to work around this by copying.

The method

>>> tree.GetEvent(0)
>>>tree.GetLeaf("Phi").GetValue(0)

works now, I was missing the GetEvent method. However, in this particular event (from the file attatched in the last message, in Electron Phi) there is only one electron and still I get

>>> tree.GetLeaf("Phi").GetValue(0)
2.6257577628212521
>>> tree.GetLeaf("Phi").GetValue(1)
-2.8944267021212595
>>> tree.GetLeaf("Phi").GetValue(2)
2.7714172040086877
>>> tree.GetLeaf("Phi").GetValue(3)
-0.72657626867294312

and from 4, the values are 0.0. The first value is the same as what I get through the other method though.

These things are there but are not in the user’s guide, is there any other documentation for trees?[/code]

wlav · July 11, 2005, 3:02pm

Akira,

[quote]I observed that

output=() output+=aValue
was much slower.[/quote]
Certainly. You said so yourself: tuples are immutable. Hence, every time you change it (e.g. by adding a value), a new tuple (and hence a full copy) must be created.

As for the leaf access, if you call “TTree::GetLeaf()”, you get a C++ array as a “double*”. As in C++, the size of that array is unknown and you can index in there as far as you like (which you do with each call of TLeaf::GetValue(), e.g. GetValue(0) is really leaf[0], GetValue(1) is leaf[1], etc.).

Now, since you own the memory, that’s all fine as far as the program is concerned, but the value that you get back may not mean anything unless a value was put at the address. If not, you just read whatever was there, which may look like a proper value because it may be a remnant from a read from another event. Therefore, you should use your own electron counter to figure out how far into the array you should access. In this case, the counter is tree.no (which you can see by looking at the output of tree.Print()).

However, if you’d done as I recommended, i.e. the python way, you’d have gotten this:

[code]>>> f = TFile(“MtBjj.test.ntuple.root”)

tree=f.Get(“ElectronNtuple”)
tree.GetEvent(0)
tree.Phi[0]
2.6257577628212521
tree.Phi[1]
Traceback (most recent call last):
File “”, line 1, in ?
IndexError: buffer index out of range
[/code]
A fact that you can use to loop over your phi’s, like e.g. so:

[code]>>> for phi in tree.Phi:
… print phi
…
2.62575776282

[/code]

It looks prettier, it is safer, and it is faster, too.

Cheers,
Wim

Akira · July 11, 2005, 3:33pm

Hi, Thanks again for reply,

The problem with the approach such as tree.Phi[0] is that I cannot write a generalized function that obtains variables. eg.

def fun(self, treeName, leafName): tree=self.f.Get(treeName) print tree.leafName[0] #or whatever
will obviously not work. Is there a workaround to this?

Cheers
Akira

wlav · July 11, 2005, 3:55pm

Akira,

hey, this is python! Of course you can do what you want:

def fun(self, treeName, leafName): tree=self.f.Get(treeName) print getattr(tree,leafName)[0] #or whatever
HTH,
Wim

pcanal · July 11, 2005, 5:39pm

Hi,

For this level of flexibility you may want to use TTreeFormula

tree =self.f.Get(treeName) TTreeFormula form("form",leafName,tree); tree.GetEntry(0); print form.EvalInstance(0);Cheers,
Philippe.

Akira · July 12, 2005, 4:08pm

Hi,
Thanks. Great help.
What does TTreeFormula do? Is this documented somewhere? It seems to me that TTre have lots of nice feature that are not documented in the user guide…

pcanal · July 12, 2005, 6:29pm

TTreeFormula is used by TTree::Draw to access the data.
Given a formula and a tree, it will give you access (in double value) to the result of the formula (which can use tree branchname).
Unfortunately, the only documentation is the only provided in the class description and the usages (TSelectorDraw and TTreePlayer::Scan).

Cheers,
Philippe