nth function/method for groupby

An `nth` function, similar to [pandas groupby nth](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.nth.html), where the selected row/rows stay as part of the new aggregations in the presence of `by`:

```
from datatable import dt, f, by

data = {'x': [1, 1, 1, 2, 2, 3],
        'y': [1, 2, 3, 1, 2, 1],
        'n': [3, 2, 1, 1, 2, 1]}

DT = dt.Frame(data)

DT

   |     x      y      n
   | int32  int32  int32
-- + -----  -----  -----
 0 |     1      1      3
 1 |     1      2      2
 2 |     1      3      1
 3 |     2      1      1
 4 |     2      2      2
 5 |     3      1      1
[6 rows x 3 columns]
```

The `nth` function/method should return the row or rows along with other aggregations:

```
DT[:, {'sum': f.y.sum(), 'nth' : f.y.nth(1)}, 'x']

    |   x    sum      nth
    | int32  int64  int32
 -- + -----  -----  -----
  0 |     1      6      2
  1 |     2      3      2
  2 |     3      1     NA
 [3 rows x 3 columns]

```
One way to currently implement this would be to do a `cbind`: 

 ``dt.cbind(DT[:, {"sum": f.y.sum()}, "x"], 
                 DT[1, 'n', by('x', add_columns=False)], 
                force=True)``

which might not properly align; a much safer option would be a left join.

The implementation in #2176 selects a single row or slice; and since within datatable `i` is executed before `j`, the results wont come out right. Also, at the moment, sequences are not accepted within the `i`; it would be nice to be able to select lists with the `nth` function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nth function/method for groupby #3128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nth function/method for groupby #3128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions