Speed up conversion from cartesian to polar coordinates, speed up file reading in Python

Chris90 · December 12, 2022, 10:05am

Dear all,

I wrote a code in python to read a ROOT file, which contains a list of particle information, i.e. positions, directions, energies. The directions are in cartesian coordinates and I want to convert them in spherical coordinates. Here below my code:


root_file="Info_particles.root"
df=ROOT.RDataFrame("tree1", root_file)
npyFile=df.AsNumpy()

for i in range(len(npyFile['PosX'])):    
    posList.append(np.array([npyFile['PosX'][i], npyFile['PosY'][i], npyFile['PosZ'][i]]))
    eneList.append(npyFile['Ene'][i])
    dirList.append(np.array([npyFile['DirX'][i], npyFile['DirY'][i], npyFile['DirZ'][i]]))

for direct in dirList:
    r=np.linalg.norm([direct[0], direct[1], direct[2]])
    theta, phi=cs.cart2sp(x=direct[0], y=direct[1], z=direct[2])[1:]
    theta, phi=np.arctan2(mom[0], mom[1]),np.arccos(mom[2]/r)
    rThePhiList.append([float(math.degrees(theta)), float(math.degrees(phi))])

The total number of particles is 6.7e6. The 1st for cycle takes around 13 seconds but the 2nd one 1 minute. Considering that I have to read other 100 and more files, in order to have a faster reading:

There is a way, in ROOT and python, to speed up the conversion between cartesian to spherical coordinates? And, in general, to speed up the reading procedure?
There is a way to parallelize this process in a cluster of tens of cores? I tried different solutions in python that didn’t worked. Indeed, python allows to parallelize multi processes but there is no a straightforward way to parallelize with multi-threading on different cores.

Thanks in advance for your time.

Christian

dastudillo · December 12, 2022, 12:22pm

You are already using an RDataFrame, which can work in parallel, see the manual here and the tutorials. I don’t have experience with RDataFrame, so I cannot suggest a clear answer but it might be all you need, probably it could be as simple as adding ROOT::EnableImplicitMT() and something like

df.Define("newVar","equation for new variable").Snapshot("newTree","rootfile")

instead of the for loops. If you define several new variables, probably the Snapshots could be done at another point to run faster, but that’s the general idea (check out the links above).

Chris90 · December 12, 2022, 5:11pm

Ok thanks! I’m checking the tutorial df001_introduction. I don’t understand what they mean with rdfentry in the fill_tree function. It seems that the tree is filled by this quantity and then the file is read but I don’t understand how the quantity in rdfentry are defined.

Christian

system · December 26, 2022, 5:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.