I have an `RDataFrame`

, which consists of three columns (`A`

, `B`

, and `run`

). Here’s what it might look like in tabular form:

```
+-----+-----+-----+
| run | A | B |
+-----+-----+-----+
| 001 | 35 | 5 |
| 001 | 40 | 10 |
| 001 | 77 | 60 |
| | | |
| ... | ... | ... |
| | | |
| 002 | 42 | 40 |
| 002 | 30 | 28 |
| 002 | 50 | 1 |
| ... | ... | ... |
+-----+-----+-----+
```

*(where the … dots indicate continuation)*

For each run, I want to determine the most common difference of `A`

and `B`

(i.e., the mode of the set of entry-wise differences of `A`

and `B`

for each run). So, for example, for run `002`

we have `42-40 = 2`

, `30-28=2`

, and `50 - 1=49`

, the most common element/mode of `2,2, and 49`

is `2`

, so the result for run `002`

is `2`

.

*A couple important notes: A-B is guaranteed to be a positive, integer value for all entries, and there is guaranteed to be a single unique mode for each run*

Currently, what I’m doing is this:

```
// Some code above
map<int, map<int, int>> offset;
df.Foreach([](int A, int B, int run){
++map[run][A-B];
},{"A","B","run"});
// Some code that, for each map in the map, determines the key of the
// max value (i.e. the mode), and stores this in a vector
```

So, we have a map of maps. The “outer” map maps run numbers to “interior” maps. “Interior” maps map offset (`A-B`

) to frequency. I then, for each “interior” map, get the key of the largest value (i.e. the mode), and store this in a new map, which maps run numbers to mode of `A-B`

for that run.

This works, however it’s quite slow (and, for my purposes, impractically so). My `RDataFrame`

contains on the order of tens of millions or more entries.

I imagine there must be a better way. I’ve spent some time with the `RDataFrame`

docs and, unfortunately, I can’t seem to piece together anything measurably more efficient.

Is there a better way? Please, let me know if the issue at hand isn’t clear. Thanks in advance!