Pyroot strange list length with file.Get(histo)

I have a root file which contains 3200 histograms, they’re named "Name/NameEvent1Ch1""Name/NameEvent1Ch31""Name/NameEvent2Ch0""Name/NameEvent{x}Ch32"
Half of the histograms come from an x readout of a detector, half from y, so I split them into two, and then i want to be able to access them easily by event number, so I thought the easiest way would be to read in all of the histograms and then reshape the arrays, but I get a strange error from python.

I define the following arrays:

x_channels = [8,7,9,6,10,5,11,4,12,3,13,2,14,1,15,0]
y_channels = [27,28,26,29,25,30,24,31,23,16,22,17,21,18,20,19]

Then this works fine:

analogues_x = [(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in x_channels]
analogues_y = [(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in y_channels]

analogues_x=np.reshape(analogues_x,(100,16))
analogues_y=np.reshape(analogues_y,(100,16))

But this gives me an error:

analogues_x = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in x_channels]
analogues_y = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in y_channels]

print(len(analogues_x),len(analogues_y))

analogues_x=np.reshape(analogues_x,(100,16))
analogues_y=np.reshape(analogues_y,(100,16))

The error says:

 File "/Users/bethlong/Documents/PADME/230111padme-fw/PadmeReco/AnalogueTargetStudies.py", line 43, in <module>
    analogues_x=np.reshape(analogues_x,(100,16))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 285, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 1641600 into shape (100,16)

Even though print(len(analogues_x),len(analogues_y)) gives 1600 1600
Can anyone tell me what’s going wrong?

Maybe @vpadulan can help?

I’m bumping this since I’m really stuck!

I get a different error (on linux):

Traceback (most recent call last):
  File "/home/x/b.py", line 11, in <module>
    a_x = np.reshape(h,(1,3))
         ^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 200, in reshape
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 298, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
                     ^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Which hints that np.reshape needs an array and it’s not getting one. This works, maybe you can try:

import ROOT
import numpy as np

f = ROOT.TFile("hsimple.root","READ")
myvars= ['prof','px','pxpy']
h = [f.Get(f"h{v}") for v in myvars]
#h[1].Draw("hist")
print(h)
print(len(h))
hn = np.array(h, dtype=object)
a_x = np.reshape(hn,(1,3))
a_x[0,1].Draw("hist")
print(a_x)

(note that dtype=object was needed too) giving this output:

[<cppyy.gbl.TProfile object at 0x7bae8f0>, <cppyy.gbl.TH1F object at 0x7d4a2b0>, <cppyy.gbl.TH2F object at 0x5d80f30>]
3
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
[[<cppyy.gbl.TProfile object at 0x7bae8f0>
  <cppyy.gbl.TH1F object at 0x7d4a2b0>
  <cppyy.gbl.TH2F object at 0x5d80f30>]]

and drawing hpx as expected:

I tried this but it still doesn’t work, it still gives me the error ValueError: cannot reshape array of size 1641600 into shape (100,16) (Intel MacOS 14.3)

I could really use some more help on this, I’ve tried defining everything as an np.array() but I still get the same error telling me the size of the array is wrong

Update: maybe I’ve found the problem:
My histograms each have 1024 bins, plus underflow and overflow makes 1026.

If I do:

x_channels = [8,7,9,6,10,5,11,4,12,3,13,2,14,1,15,0]
y_channels = [27,28,26,29,25,30,24,31,23,16,22,17,21,18,20,19]


analogues_x = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in x_channels]
analogues_y = [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for Ev in range(100) for ch in y_channels]

print(len(analogues_x),len(analogues_y))
print(len(analogues_x[0]),len(analogues_y[0]))

I get:

1600 1600
1026 1026

So it seems that, although when I print(analogues_x) I get [<cppyy.gbl.TH1D object at 0x7fdf007a8010>, <cppyy.gbl.TH1D object at 0x7fdf0200ba50>,...] the contents of the histogram is actually being read into my lists, instead of creating a list of objects.

If I convert this list to a np.array() then it’s very slow, and the only reason I need an np.array() is so I can reshape my list to have simple access to histograms divided by event (and eventually by channel instead).

Is there a better solution than the workaround of:

analogues_x_Ev=[]
for Ev in range(100):
    analogues_x_Ev.append([root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for ch in x_channels])

?

Dear @bethlong06 ,

Thanks for reaching out to the forum! Let me ask you a few questions so I can better understand your use case.

So it seems that, although when I print(analogues_x) I get [<cppyy.gbl.TH1D object at 0x7fdf007a8010>, <cppyy.gbl.TH1D object at 0x7fdf0200ba50>,...] the contents of the histogram is actually being read into my lists, instead of creating a list of objects.

Seeing [<cppyy.gbl.TH1D object at 0x7fdf007a8010>, <cppyy.gbl.TH1D object at 0x7fdf0200ba50>,...] already tells you that the list contains objects, one histogram per element. This should be as expected, unless I am missing something.

Another story is printing len as in len(analogues_x[0]). That will give you by default the length of the histogram (including underflow/overflow bins). Also, the histogram can be iterated to retrieve the counts of each bin, see

>>> import ROOT
>>> h = ROOT.TH1D("h","h",10,-3,3)
>>> h.FillRandom("gaus")
>>> len(h)
12
>>> for bc in h:
...     print(bc)
... 
0.0
39.0
136.0
394.0
796.0
1131.0
1106.0
817.0
397.0
144.0
40.0
0.0

So everything that you show seems like normal behaviour to me.

Now, coming to my understanding of your actual goal:

so I can reshape my list to have simple access to histograms divided by event (and eventually by channel instead).

Wouldn’t this be enough?

analogues_x = [
    [root_file.Get(f"Target/TargetEvent{Ev}Ch{ch}") for ch in x_channels]
    for Ev in range(100)
]

Let me know if I am missing something here, thanks!

Cheers,
Vincenzo

Yes that works fine, I don’t know why I wasn’t able to work it out by myself so thank you very much for your patience!