Comparing DataFrame entries to numpy array

Hi all,

I’m using Jupyter with python

I have a dataframe which includes many rows and each row is split into lines.
There are two relevant columns hitX and hitY

I also have a np.array with, say, 50 pairs of (x,y) coordinates [[3,4],[4,5],[12,1],…]

I’m trying to find an efficient way for each line in df to compare the coordinates hitX and hitY to all the pairs from the array and create a new column where the closest coordinates from the array will be.

Can I somehow do it without converting the whole DataFrame to AsNumpy and looping through it as it is very time consuming?

For example for hitX=3 and hitY=4 and array[[1,2],[4,3]] there should be two new columns with values 4 and 3 (since these are the closest ones)

ROOT Version: 6.30/04
Platform: Jupyter

Hello,

Thanks for the interesting post!
If I understand correctly, and please correct me if I am wrong, your current problem would be solved if you could create a Define in which you check the values of hitX and hitY for each event against the 50 x,y coordinates and return the coordinate of the array which is the closest to (hitX, hitY) wrt to some metric.

There are many ways to achieve this. For example you could convert the np array in 2 vectors or one vector of pairs, and then write the matching function. A skeleton could be:

# Declare the C++ version of the np array of coordinates and the define function
ROOT.gInterpreter.Declare('''
vector<pair<int,int>> coordinates;
std::pair<int, int> GetClosest(int x0, int y0) {
   auto minDist = std::numeric_limits<double>::max();
   int minIdx = 0;
   // std algorithms could be used here
   for (int i=0; i< coordinates.size(); ++i) {
        auto dist = pow((x0-coordinates[i].first),2)+pow((y0-coordinates[i].second),2)
        if (dist < minDist) minIdx = i;
     }
   return coordinates[i];
}
''')
# fill it
for x_y in myPythonArrayOfCoordinates:
    coordinates.push_back((x,y))
# here all the DF definition
# ...
df  = df.Define("closestXY", "GetClosest(hitX, hitY)")

I hope the above helps you get started.

Cheers,
D

2 Likes

Thank you so much! I’m not very familiar with cpp so was struggling with the definition. With your help I managed to do exactly what I wanted. Thank you again

Glad it helped. Thanks for the confirmation!
We are working to make the RDF python experience even more rewarding. In the next few releases, we’ll make sure things are more pythonic than that.

Cheers,
Danilo