Pyroot strange list length with file.Get(histo)

I have a root file which contains 3200 histograms, they’re named "Name/NameEvent1Ch1""Name/NameEvent1Ch31""Name/NameEvent2Ch0""Name/NameEvent{x}Ch32"
Half of the histograms come from an x readout of a detector, half from y, so I split them into two, and then i want to be able to access them easily by event number, so I thought the easiest way would be to read in all of the histograms and then reshape the arrays, but I get a strange error from python.

I define the following arrays:

x_channels = [8,7,9,6,10,5,11,4,12,3,13,2,14,1,15,0]
y_channels = [27,28,26,29,25,30,24,31,23,16,22,17,21,18,20,19]

Then this works fine:

analogues_x = [(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in x_channels]
analogues_y = [(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in y_channels]

analogues_x=np.reshape(analogues_x,(100,16))
analogues_y=np.reshape(analogues_y,(100,16))

But this gives me an error:

analogues_x = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in x_channels]
analogues_y = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in y_channels]

print(len(analogues_x),len(analogues_y))

analogues_x=np.reshape(analogues_x,(100,16))
analogues_y=np.reshape(analogues_y,(100,16))

The error says:

 File "/Users/bethlong/Documents/PADME/230111padme-fw/PadmeReco/AnalogueTargetStudies.py", line 43, in <module>
    analogues_x=np.reshape(analogues_x,(100,16))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 285, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 1641600 into shape (100,16)

Even though print(len(analogues_x),len(analogues_y)) gives 1600 1600
Can anyone tell me what’s going wrong?

Maybe @vpadulan can help?

I’m bumping this since I’m really stuck!

I get a different error (on linux):

Traceback (most recent call last):
  File "/home/x/b.py", line 11, in <module>
    a_x = np.reshape(h,(1,3))
         ^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 200, in reshape
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 298, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
                     ^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Which hints that np.reshape needs an array and it’s not getting one. This works, maybe you can try:

import ROOT
import numpy as np

f = ROOT.TFile("hsimple.root","READ")
myvars= ['prof','px','pxpy']
h = [f.Get(f"h{v}") for v in myvars]
#h[1].Draw("hist")
print(h)
print(len(h))
hn = np.array(h, dtype=object)
a_x = np.reshape(hn,(1,3))
a_x[0,1].Draw("hist")
print(a_x)

(note that dtype=object was needed too) giving this output:

[<cppyy.gbl.TProfile object at 0x7bae8f0>, <cppyy.gbl.TH1F object at 0x7d4a2b0>, <cppyy.gbl.TH2F object at 0x5d80f30>]
3
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
[[<cppyy.gbl.TProfile object at 0x7bae8f0>
  <cppyy.gbl.TH1F object at 0x7d4a2b0>
  <cppyy.gbl.TH2F object at 0x5d80f30>]]

and drawing hpx as expected:

I tried this but it still doesn’t work, it still gives me the error ValueError: cannot reshape array of size 1641600 into shape (100,16) (Intel MacOS 14.3)

I could really use some more help on this, I’ve tried defining everything as an np.array() but I still get the same error telling me the size of the array is wrong

Update: maybe I’ve found the problem:
My histograms each have 1024 bins, plus underflow and overflow makes 1026.

If I do:

x_channels = [8,7,9,6,10,5,11,4,12,3,13,2,14,1,15,0]
y_channels = [27,28,26,29,25,30,24,31,23,16,22,17,21,18,20,19]


analogues_x = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in x_channels]
analogues_y = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in y_channels]

print(len(analogues_x),len(analogues_y))
print(len(analogues_x[0]),len(analogues_y[0]))

I get:

1600 1600
1026 1026

So it seems that, although when I print(analogues_x) I get [<cppyy.gbl.TH1D object at 0x7fdf007a8010>, <cppyy.gbl.TH1D object at 0x7fdf0200ba50>,...] the contents of the histogram is actually being read into my lists, instead of creating a list of objects.

If I convert this list to a np.array() then it’s very slow, and the only reason I need an np.array() is so I can reshape my list to have simple access to histograms divided by event (and eventually by channel instead).

Is there a better solution than the workaround of:

analogues_x_Ev=[]
for Ev in range(100):
    analogues_x_Ev.append([root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for ch in x_channels])

?

Dear @bethlong06 ,

Thanks for reaching out to the forum! Let me ask you a few questions so I can better understand your use case.

So it seems that, although when I print(analogues_x) I get [<cppyy.gbl.TH1D object at 0x7fdf007a8010>, <cppyy.gbl.TH1D object at 0x7fdf0200ba50>,...] the contents of the histogram is actually being read into my lists, instead of creating a list of objects.

Seeing [<cppyy.gbl.TH1D object at 0x7fdf007a8010>, <cppyy.gbl.TH1D object at 0x7fdf0200ba50>,...] already tells you that the list contains objects, one histogram per element. This should be as expected, unless I am missing something.

Another story is printing len as in len(analogues_x[0]). That will give you by default the length of the histogram (including underflow/overflow bins). Also, the histogram can be iterated to retrieve the counts of each bin, see

>>> import ROOT
>>> h = ROOT.TH1D("h","h",10,-3,3)
>>> h.FillRandom("gaus")
>>> len(h)
12
>>> for bc in h:
...     print(bc)
... 
0.0
39.0
136.0
394.0
796.0
1131.0
1106.0
817.0
397.0
144.0
40.0
0.0

So everything that you show seems like normal behaviour to me.

Now, coming to my understanding of your actual goal:

so I can reshape my list to have simple access to histograms divided by event (and eventually by channel instead).

Wouldn’t this be enough?

analogues_x = [
    [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for ch in x_channels]
    for Ev in range(100)
]

Let me know if I am missing something here, thanks!

Cheers,
Vincenzo

Yes that works fine, I don’t know why I wasn’t able to work it out by myself so thank you very much for your patience!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.