Need an Idea on How to Load Data from Several Files

Hello there.

The title is probably not very suggestive, or even misleading, since I am not only interested in importing a Tree from multiple ROOT files. I also want to multiply the data in the tree and then sum up all the data.

Ok, let’s go it through step by step. I have, say 5, ROOT files (rootfile1.root, rootfile2.root, etc.).
In every one of those files i have a tree with the same name, say T2. In every T2 in every file i have a variable, say S2. For every file I also know about a correction factor, which has not been implemented in those files yet (for whatever reason, not important here). So, i also have 5 correction factors, say A1, A2, …, A5, one for each rootfile and applied always to S2.

The question now is, since I am a ROOT and PyROOT beginner, I am not sure of how to load and Draw this data. What i finally want to have is the following. I want to have a TH1 of S2 summed up from all 5 ROOT files with the correct factor multiplied. So I first want to open the first ROOT file, read S2, multiply it with A1 and store it in an empty histogram, say h1. Then I want to move on with the second ROOT file, read S2, multiply with A2, and add it to h1. And the same for number 3, 4 and 5.

The easiest way to read a tree from a file is with TChain, isn’t it? But when I do T2.Add(…) i already add the data without incorporating the factors A1, …, A5. And the same problem I have with TTree() and TFile(), don’t I? So, my question is now: how can i do it? What other way is to do it?

Probably the question is rather stupid. But I am a beginner trying to find my way through ROOT and PyROOT with numerous tutorials and questioning colleagues, who btw also don’t know how to do it.

Thank you for your help.
Best wishes,
heico

Since your correction factor is file-based, using a TChain is probably not the easiest thing to use. You could do something like:

# define a list of correction factors A = [A1,A2,A3...]
# define a list of filenames somehow... filenames = ["foo1.root","foo2.root",...}
files = [ROOT.TFile(fname) for fname in filenames)]
trees = [f.Get("T2") for f in files]
h = ROOT.TH1D("h","h",10,0,10) # Make a plain histogram with appropriate parameters
for j,t in enumerate(trees):
    for i in xrange(t.GetEntries()):
        t.GetEntry(i)
        h.Fill( t.S2 + A[j] )
h.Draw()

Using a TChain and TChain::Draw commands is probably still possible, but this more pythonic way is easier to understand. It’s also probably slower, but there are tricks for speeding it up, like turning off all the branches other than S2, using PyPyROOT, etc.

Jean-François

Hello.

Thank you Jean-François for your reply.
How can I modify the code if S2 is stored in an array?
Like S2[i] and I wanna use the first element, i=0, for example.
Do I have to read the branch by setting address, etc.?

And how fast is it compared to a simple Draw() command which I can use when I do not need to incorporate those A[i] factors. Is it much slower?

Thank you for your help!
I really appreciate it!

heico

Hi there.

I modified the code now the way I thought it could work, but it doesnt.

import ROOT
from library import list_files as DATA

filelist = [DATA.Filepath + elm for elm in DATA.Filelist]
files = [ROOT.TFile(elm) for elm in filelist]
trees = [elm.Get("T2") for elm in files]

c1 = ROOT.TCanvas("c1", "C1")
h1 = ROOT.TH1F("h1", "H1", 200, 0, 1000)

s2 = ROOT.std.vector(float)()

for j,t in enumerate(trees):
    t.Branch("SignalS2", "vector<float>", s2)
    for i in xrange(t.GetEntries()):
        t.GetEntry(i)
        h1.Fill(t.s2[0] * DATA.Factor[j]) # ***

h1.Draw()
c1.Print("test.png")

If I write the line (***) like it is given in the code, I get the error message, that T2 does not contain a branch “s2”. If I replace t.s2[0] in the line (***) by t.SignalS2 and also call my vector SignalS2 and not s2 (i.e. I give it the same name as in the tree) then I get the error message that the “index is out of range”. That means, I cannot even fill h1 with the unchanged s2 alone. What am I doing wrong here?

Thank you for your reply.
heico

If you have a TTree with an array branch, you should be able to access it using the [i]-notation like you are trying. Here’s a manual reference: wlav.web.cern.ch/wlav/pyroot/tpytree.html

What you describe seems to deviate from expected behavior, so my expertise has run out. Perhaps one of the real experts can help. I would recommend double-checking that your TFiles were opened properly and that your TTrees were obtained properly from those files. When you do mytree.GetEntry(j), there is a value returned equal to the number of bytes read from the TTree. You could check that this number is positive-definite, as a check for errors in the tree.

Good luck!
Jean-François

Hi Jean-François.

I did return the GetEntry(i) and it does give me positive-definite values.
However, I am not sure if I should be happy or desperate about it, since I don’t know how to move on now… :confused:

Regards,
heico

Ok got it. Reading through several posts here on ROOT Talk I have found a way to do it.
I marked the lines (#++#++) that I added in my code.
Its working perfectly and the spectrum looks fine too!
Ah… today is a good day! :slight_smile:

import ROOT
from library import list_files as DATA

filelist = [DATA.Filepath + elm for elm in DATA.Filelist]
files = [ROOT.TFile(elm) for elm in filelist]
trees = [elm.Get("T2") for elm in files]

c1 = ROOT.TCanvas("c1", "C1")
h1 = ROOT.TH1F("h1", "H1", 200, 0, 1000)

s2 = ROOT.std.vector(float)()

for j,t in enumerate(trees):
    for i in xrange(t.GetEntries()):
        t.GetEntry(i)
        s2 = getattr(t, "SignalS2") #++#++
        if len(s2)>0: #++#++
            h1.Fill(s2[0] * DATA.Factor[j])

h1.Draw()
c1.Print("test.png")

getattr(t,“SignalS2”) should be completely equivalent to t.SignalS2, so I’m guessing the problem was that your branch is actually called “SignalS2”, not “S2” or “s2”.

Anyways, I’m glad it’s working.

Jean-François

Well, I tried t.SignalS2[i] and that didnt work. Index ot of range, it said. Next I tried t.SetBranchAddress(“SignalS2”, s2), which didn’t help either. getattr() is the only thing that worked (so far), dunno why…

Cheers,
heico