From d9c884d2e4be299530f50217229ee0bc90bff9c7 Mon Sep 17 00:00:00 2001 From: Jolan Rensen Date: Wed, 28 May 2025 16:34:41 +0200 Subject: [PATCH] updating docs to provide links to column selectors from everywhere we mention selecting columns for operations --- docs/StardustDocs/topics/add.md | 2 +- docs/StardustDocs/topics/convert.md | 3 ++- docs/StardustDocs/topics/corr.md | 4 +++- docs/StardustDocs/topics/cumSum.md | 2 ++ docs/StardustDocs/topics/distinct.md | 4 ++++ docs/StardustDocs/topics/drop.md | 6 ++++++ docs/StardustDocs/topics/explode.md | 2 ++ docs/StardustDocs/topics/fill.md | 6 ++++++ docs/StardustDocs/topics/filter.md | 4 +++- docs/StardustDocs/topics/flatten.md | 5 ++++- docs/StardustDocs/topics/gather.md | 4 ++-- docs/StardustDocs/topics/group.md | 2 ++ docs/StardustDocs/topics/groupBy.md | 3 ++- docs/StardustDocs/topics/implode.md | 2 ++ docs/StardustDocs/topics/inferType.md | 4 +++- docs/StardustDocs/topics/mean.md | 2 ++ docs/StardustDocs/topics/median.md | 2 ++ docs/StardustDocs/topics/merge.md | 2 ++ docs/StardustDocs/topics/minmax.md | 2 ++ docs/StardustDocs/topics/move.md | 4 ++-- docs/StardustDocs/topics/percentile.md | 2 ++ docs/StardustDocs/topics/pivot.md | 13 ++++++++----- docs/StardustDocs/topics/remove.md | 2 +- docs/StardustDocs/topics/rename.md | 2 ++ docs/StardustDocs/topics/reorder.md | 2 ++ docs/StardustDocs/topics/replace.md | 2 +- docs/StardustDocs/topics/reshape.md | 3 --- docs/StardustDocs/topics/sortBy.md | 4 ++++ docs/StardustDocs/topics/split.md | 12 +++++++----- docs/StardustDocs/topics/std.md | 2 ++ docs/StardustDocs/topics/sum.md | 2 ++ docs/StardustDocs/topics/summaryStatistics.md | 4 +++- docs/StardustDocs/topics/unfold.md | 4 +++- docs/StardustDocs/topics/ungroup.md | 2 +- docs/StardustDocs/topics/update.md | 3 ++- docs/StardustDocs/topics/valueCounts.md | 2 ++ docs/StardustDocs/topics/values.md | 4 +++- 37 files changed, 99 insertions(+), 31 deletions(-) delete mode 100644 docs/StardustDocs/topics/reshape.md diff --git a/docs/StardustDocs/topics/add.md b/docs/StardustDocs/topics/add.md index d11a64823e..36fdc10be9 100644 --- a/docs/StardustDocs/topics/add.md +++ b/docs/StardustDocs/topics/add.md @@ -2,7 +2,7 @@ -Returns [`DataFrame`](DataFrame.md) which contains all columns from original [`DataFrame`](DataFrame.md) followed by newly added columns. +Returns [`DataFrame`](DataFrame.md) which contains all columns from the original [`DataFrame`](DataFrame.md) followed by newly added columns. Original [`DataFrame`](DataFrame.md) is not modified. `add` appends columns to the end of the dataframe by default. diff --git a/docs/StardustDocs/topics/convert.md b/docs/StardustDocs/topics/convert.md index 8dbb82b7c0..6f2dee98d2 100644 --- a/docs/StardustDocs/topics/convert.md +++ b/docs/StardustDocs/topics/convert.md @@ -13,7 +13,8 @@ colExpression = DataFrame.(DataColumn) -> DataColumn frameExpression: DataFrame.(DataFrame) -> DataFrame ``` -See [column selectors](ColumnSelectors.md) and [row expressions](DataRow.md#row-expressions) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation and +[row expressions](DataRow.md#row-expressions) for how to provide new values. diff --git a/docs/StardustDocs/topics/corr.md b/docs/StardustDocs/topics/corr.md index bede656c09..ddbbdc44c0 100644 --- a/docs/StardustDocs/topics/corr.md +++ b/docs/StardustDocs/topics/corr.md @@ -9,6 +9,8 @@ corr { columns1 } .with { columns2 } | .withItself() ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + To compute pairwise correlation between all columns in the [`DataFrame`](DataFrame.md) use `corr` without arguments: ```kotlin @@ -19,7 +21,7 @@ The function is available for numeric- and `Boolean` columns. `Boolean` values are converted into `1` for `true` and `0` for `false`. All other columns are ignored. -If a [`ColumnGroup`](DataColumn.md#columngroup) instance is passed as target column for correlation, +If a [`ColumnGroup`](DataColumn.md#columngroup) instance is passed as the target column for correlation, it will be unpacked into suitable nested columns. The resulting [`DataFrame`](DataFrame.md) will have `n1` rows and `n2+1` columns, diff --git a/docs/StardustDocs/topics/cumSum.md b/docs/StardustDocs/topics/cumSum.md index 0dcd052334..1464281a14 100644 --- a/docs/StardustDocs/topics/cumSum.md +++ b/docs/StardustDocs/topics/cumSum.md @@ -10,6 +10,8 @@ cumSum(skipNA = true) [ { columns } ] Returns a [`DataFrame`](DataFrame.md) or [`DataColumn`](DataColumn.md) containing the cumulative sum. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + **Parameters:** * `skipNA` — when `true`, ignores [`NA` values](nanAndNa.md#na) (`null` or `NaN`). When `false`, all values after first `NA` will be `NaN` (for `Double` and `Float` columns) or `null` (for integer columns). diff --git a/docs/StardustDocs/topics/distinct.md b/docs/StardustDocs/topics/distinct.md index 8c3471d804..f55af3d360 100644 --- a/docs/StardustDocs/topics/distinct.md +++ b/docs/StardustDocs/topics/distinct.md @@ -16,6 +16,8 @@ df.distinct() If columns are specified, resulting [`DataFrame`](DataFrame.md) will have only given columns with distinct values. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + @@ -43,6 +45,8 @@ df.select("age", "name").distinct() Keep only the first row for every group of rows grouped by some condition. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + diff --git a/docs/StardustDocs/topics/drop.md b/docs/StardustDocs/topics/drop.md index faceea98aa..dfbb4575ce 100644 --- a/docs/StardustDocs/topics/drop.md +++ b/docs/StardustDocs/topics/drop.md @@ -27,6 +27,8 @@ df.drop { it["weight"] == null || it["city"] == null } Remove rows with `null` values +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin @@ -44,6 +46,8 @@ df.dropNulls(whereAllNull = true) { city and weight } // remove rows with null v Remove rows with [`NaN` values](nanAndNa.md#nan) (`Double.NaN` or `Float.NaN`). +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin @@ -61,6 +65,8 @@ df.dropNaNs(whereAllNaN = true) { age and weight } // remove rows where both 'ag Remove rows with [`NA` values](nanAndNa.md#na) (`null`, `Double.NaN`, or `Float.NaN`). +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin diff --git a/docs/StardustDocs/topics/explode.md b/docs/StardustDocs/topics/explode.md index c3a0ff54f7..e30860696d 100644 --- a/docs/StardustDocs/topics/explode.md +++ b/docs/StardustDocs/topics/explode.md @@ -8,6 +8,8 @@ Splits list-like values in given columns and spreads them vertically. Values in explode(dropEmpty = true) [ { columns } ] ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + **Parameters:** * `dropEmpty` — if `true`, removes rows with empty lists or [`DataFrame`](DataFrame.md) objects. Otherwise, they will be exploded into `null`. diff --git a/docs/StardustDocs/topics/fill.md b/docs/StardustDocs/topics/fill.md index 757b99b0f6..2c0ff634ea 100644 --- a/docs/StardustDocs/topics/fill.md +++ b/docs/StardustDocs/topics/fill.md @@ -8,6 +8,8 @@ Replace missing values. Replaces `null` values with given value or expression. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin @@ -23,6 +25,8 @@ df.update { colsOf() }.where { it == null }.with { -1 } Replaces [`NaN` values](nanAndNa.md#nan) (`Double.NaN` and `Float.NaN`) with given value or expression. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin @@ -36,6 +40,8 @@ df.fillNaNs { colsOf() }.withZero() Replaces [`NA` values](nanAndNa.md#na) (`null`, `Double.NaN`, and `Float.NaN`) with given value or expression. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin diff --git a/docs/StardustDocs/topics/filter.md b/docs/StardustDocs/topics/filter.md index c284be80c4..e25a15435d 100644 --- a/docs/StardustDocs/topics/filter.md +++ b/docs/StardustDocs/topics/filter.md @@ -25,7 +25,9 @@ df.filter { "age"() > 18 && "name"["firstName"]().startsWith("A") } ## filterBy -Returns [`DataFrame`](DataFrame.md) with rows that have value `true` in given column of type `Boolean`. +Returns [`DataFrame`](DataFrame.md) with rows that have value `true` in the given column of type `Boolean`. + +See [column selectors](ColumnSelectors.md) for how to select the column for this operation. diff --git a/docs/StardustDocs/topics/flatten.md b/docs/StardustDocs/topics/flatten.md index 0ce6f9c9b4..0b670ad264 100644 --- a/docs/StardustDocs/topics/flatten.md +++ b/docs/StardustDocs/topics/flatten.md @@ -8,7 +8,10 @@ Returns [`DataFrame`](DataFrame.md) without column groupings under selected colu flatten [ { columns } ] ``` -Columns after flattening will keep their original names. Potential column name clashes are resolved by adding minimal possible name prefix from ancestor columns. +Columns will keep their original names after flattening. +Potential column name clashes are resolved by adding minimal possible name prefix from ancestor columns. + +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. diff --git a/docs/StardustDocs/topics/gather.md b/docs/StardustDocs/topics/gather.md index 4e2638b8f8..dbbba58bde 100644 --- a/docs/StardustDocs/topics/gather.md +++ b/docs/StardustDocs/topics/gather.md @@ -4,7 +4,7 @@ Converts several columns into two columns `key` and `value`. `key` column will contain names of original columns, `value` column will contain values from original columns. -This operation is reverse to [pivot](pivot.md) +This operation is reverse to [](pivot.md) ```kotlin gather { columns } @@ -21,7 +21,7 @@ keyTransform: (columnName: String) -> K valueTransform: (value) -> R ``` -See [column selectors](ColumnSelectors.md) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. Configuration options: * `explodeLists` — gathered values of type [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) will be exploded into their elements, so `where`, `cast`, `notNull` and `mapValues` will be applied to list elements instead of lists themselves diff --git a/docs/StardustDocs/topics/group.md b/docs/StardustDocs/topics/group.md index cde4d5ba38..e6c2c7b493 100644 --- a/docs/StardustDocs/topics/group.md +++ b/docs/StardustDocs/topics/group.md @@ -15,6 +15,8 @@ groupNameExpression = DataColumn.(DataColumn) -> String It is a special case of [`move`](move.md) operation. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin diff --git a/docs/StardustDocs/topics/groupBy.md b/docs/StardustDocs/topics/groupBy.md index 2e8c8628ae..3bf5f97226 100644 --- a/docs/StardustDocs/topics/groupBy.md +++ b/docs/StardustDocs/topics/groupBy.md @@ -24,7 +24,8 @@ pivot = .pivot { columns } pivotReducer | pivotAggregator ``` -See [column selectors](ColumnSelectors.md), [groupBy transformations](#transformation), [groupBy aggregations](#aggregation), [pivot+groupBy](pivot.md#pivot-groupby) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation, +[groupBy transformations](#transformation), [groupBy aggregations](#aggregation), and [pivot+groupBy](pivot.md#pivot-groupby). diff --git a/docs/StardustDocs/topics/implode.md b/docs/StardustDocs/topics/implode.md index a45b30ee49..1c09391299 100644 --- a/docs/StardustDocs/topics/implode.md +++ b/docs/StardustDocs/topics/implode.md @@ -8,6 +8,8 @@ Returns [`DataFrame`](DataFrame.md) where values in given columns are merged int implode(dropNA = false) [ { columns } ] ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + **Parameters:** * `dropNA` — if `true`, removes `NA` values from merged lists. diff --git a/docs/StardustDocs/topics/inferType.md b/docs/StardustDocs/topics/inferType.md index c2e238bce7..4eec2c891c 100644 --- a/docs/StardustDocs/topics/inferType.md +++ b/docs/StardustDocs/topics/inferType.md @@ -2,8 +2,10 @@ -Changes type of the selected columns based on actual values stored in these columns. Resulting type of the column will be a nearest common supertype of all column values. +Changes the type of the selected columns based on the runtime values stored in these columns. +The resulting type of the column will be the nearest common supertype of all column values. ```text inferType [ { columns } ] ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. diff --git a/docs/StardustDocs/topics/mean.md b/docs/StardustDocs/topics/mean.md index 28dab158ef..052a017f62 100644 --- a/docs/StardustDocs/topics/mean.md +++ b/docs/StardustDocs/topics/mean.md @@ -43,6 +43,8 @@ df.pivot { city }.groupBy { name.lastName }.mean() See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Type Conversion The following automatic type conversions are performed for the `mean` operation: diff --git a/docs/StardustDocs/topics/median.md b/docs/StardustDocs/topics/median.md index 81daf0e40b..97c91333f4 100644 --- a/docs/StardustDocs/topics/median.md +++ b/docs/StardustDocs/topics/median.md @@ -51,6 +51,8 @@ df.pivot { city }.groupBy { name.lastName }.median() See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Type Conversion The following automatic type conversions are performed for the `median` operation. diff --git a/docs/StardustDocs/topics/merge.md b/docs/StardustDocs/topics/merge.md index 0bb25b9d59..16373a99a3 100644 --- a/docs/StardustDocs/topics/merge.md +++ b/docs/StardustDocs/topics/merge.md @@ -15,6 +15,8 @@ merge { columns } merger: (DataRow).List -> Any ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ```kotlin diff --git a/docs/StardustDocs/topics/minmax.md b/docs/StardustDocs/topics/minmax.md index 2034819ab1..e246c1c20e 100644 --- a/docs/StardustDocs/topics/minmax.md +++ b/docs/StardustDocs/topics/minmax.md @@ -42,6 +42,8 @@ df.pivot { city }.groupBy { name.lastName }.min() See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Type Conversion The following automatic type conversions are performed for the `min` and `max` operations. diff --git a/docs/StardustDocs/topics/move.md b/docs/StardustDocs/topics/move.md index fc747c8d83..9086b2f2ba 100644 --- a/docs/StardustDocs/topics/move.md +++ b/docs/StardustDocs/topics/move.md @@ -11,9 +11,9 @@ move { columns } pathSelector: DataFrame.(DataColumn) -> ColumnPath ``` -See [Column Selectors](ColumnSelectors.md) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. -Can be used to change columns hierarchy by providing `ColumnPath` for every moved column +Can be used to change column hierarchy by providing `ColumnPath` for every moved column. diff --git a/docs/StardustDocs/topics/percentile.md b/docs/StardustDocs/topics/percentile.md index 905953cae9..bef6daa082 100644 --- a/docs/StardustDocs/topics/percentile.md +++ b/docs/StardustDocs/topics/percentile.md @@ -73,6 +73,8 @@ df.pivot { city }.groupBy { name.lastName }.percentile(25.0) See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Type Conversion The following automatic type conversions are performed for the `percentile` operation. diff --git a/docs/StardustDocs/topics/pivot.md b/docs/StardustDocs/topics/pivot.md index 69114fd972..afd8337673 100644 --- a/docs/StardustDocs/topics/pivot.md +++ b/docs/StardustDocs/topics/pivot.md @@ -2,7 +2,7 @@ -Splits the rows of [`DataFrame`](DataFrame.md) and groups them horizontally into new columns based on values from one or several columns of original [`DataFrame`](DataFrame.md). +Splits the rows of a [`DataFrame`](DataFrame.md) and groups them horizontally into new columns based on values from one or several columns of the original [`DataFrame`](DataFrame.md). ```text pivot (inward = true) { pivotColumns } @@ -16,8 +16,10 @@ reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | aggregator = .count() | .matches() | .frames() | .with { rowExpression } | .values { valueColumns } | .aggregate { aggregations } | . [ { columns } ] ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + **Parameters:** -* `inward` — if `true` generated columns will be nested inside original column, otherwise they will be top-level +* `inward` — if `true` generated columns are nested inside the original column, otherwise they will be top-level * `pivotColumns` — columns with values for horizontal data grouping and generation of new columns * `indexColumns` — columns with values for vertical data grouping * `defaultValue` — value to fill mismatched pivot-index column pairs @@ -42,7 +44,7 @@ df.pivot("city") -To pivot several columns at once you can combine them using `and` or `then` infix function: +To pivot several columns at once, you can combine them using `and` or `then` infix function: * `and` will pivot columns independently * `then` will create column hierarchy from combinations of values from pivoted columns @@ -69,7 +71,8 @@ df.pivot { "city" then "name"["firstName"] } ## pivot + groupBy -To create matrix table that is expanded both horizontally and vertically, apply [`groupBy`](groupBy.md) transformation passing the columns for vertical grouping. +To create a matrix table that is expanded both horizontally and vertically, +apply [`groupBy`](groupBy.md) transformation passing the columns for vertical grouping. Reversed order of `pivot` and [`groupBy`](groupBy.md) will produce the same result. @@ -264,7 +267,7 @@ df.pivot("city").groupBy("name").aggregate { ### Pivot inside aggregate pivot transformation can be used inside [`aggregate`](groupBy.md#aggregation) function of [`groupBy`](groupBy.md). -This allows to combine column pivoting with other [`groupBy`](groupBy.md) aggregations: +This allows combining column pivoting with other [`groupBy`](groupBy.md) aggregations: diff --git a/docs/StardustDocs/topics/remove.md b/docs/StardustDocs/topics/remove.md index 746c302d26..763415cc7b 100644 --- a/docs/StardustDocs/topics/remove.md +++ b/docs/StardustDocs/topics/remove.md @@ -8,7 +8,7 @@ Returns [`DataFrame`](DataFrame.md) without selected columns. remove { columns } ``` -See [Column Selectors](ColumnSelectors.md) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. diff --git a/docs/StardustDocs/topics/rename.md b/docs/StardustDocs/topics/rename.md index cbe713fb67..214362e8cf 100644 --- a/docs/StardustDocs/topics/rename.md +++ b/docs/StardustDocs/topics/rename.md @@ -11,6 +11,8 @@ df.rename { columns }.into { nameExpression } nameExpression = (DataColumn) -> String ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + diff --git a/docs/StardustDocs/topics/reorder.md b/docs/StardustDocs/topics/reorder.md index dce5644e07..116513d47a 100644 --- a/docs/StardustDocs/topics/reorder.md +++ b/docs/StardustDocs/topics/reorder.md @@ -12,6 +12,8 @@ reorder { columns } columnExpression: DataColumn.(DataColumn) -> Value ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + diff --git a/docs/StardustDocs/topics/replace.md b/docs/StardustDocs/topics/replace.md index ed674e5f62..f691923708 100644 --- a/docs/StardustDocs/topics/replace.md +++ b/docs/StardustDocs/topics/replace.md @@ -10,7 +10,7 @@ replace { columns } columnExpression: DataFrame.(DataColumn) -> DataColumn ``` -See [column selectors](ColumnSelectors.md) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. diff --git a/docs/StardustDocs/topics/reshape.md b/docs/StardustDocs/topics/reshape.md deleted file mode 100644 index a3da1430c3..0000000000 --- a/docs/StardustDocs/topics/reshape.md +++ /dev/null @@ -1,3 +0,0 @@ -[//]: # (title: Reshape) - -Start writing here. diff --git a/docs/StardustDocs/topics/sortBy.md b/docs/StardustDocs/topics/sortBy.md index d0c3f796aa..13d8231ae7 100644 --- a/docs/StardustDocs/topics/sortBy.md +++ b/docs/StardustDocs/topics/sortBy.md @@ -8,6 +8,8 @@ By default, columns are sorted in ascending order with `null` values going first * `.desc` — changes column sort order from ascending to descending * `.nullsLast` — forces `null` values to be placed at the end of the order +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + @@ -35,6 +37,8 @@ df.sortBy { "weight".nullsLast() } Returns [`DataFrame`](DataFrame.md) sorted by one or several columns in descending order. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + diff --git a/docs/StardustDocs/topics/split.md b/docs/StardustDocs/topics/split.md index c2a88a5d49..ea00867be8 100644 --- a/docs/StardustDocs/topics/split.md +++ b/docs/StardustDocs/topics/split.md @@ -2,7 +2,7 @@ -This operation splits every value in the given columns into several values, +This operation splits every value in the given columns into several values and optionally spreads them horizontally or vertically. ```text @@ -15,10 +15,12 @@ df.split { columns } splitter = DataRow.(T) -> Iterable columnNamesGenerator = DataColumn.(columnIndex: Int) -> String ``` -The following types of columns can be split without any _splitter_ configuration: -* `String`: split by `,` and trim -* `List`: split into elements -* [`DataFrame`](DataFrame.md): split into rows +The following types of columns can be split easily: +* `String`: for instance, by `","` +* `List`: splits into elements, no `by` required! +* [`DataFrame`](DataFrame.md): splits into rows, no `by` required! + +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. ## Split in place diff --git a/docs/StardustDocs/topics/std.md b/docs/StardustDocs/topics/std.md index 0478ca2df1..056e8bd023 100644 --- a/docs/StardustDocs/topics/std.md +++ b/docs/StardustDocs/topics/std.md @@ -55,6 +55,8 @@ df.pivot { city }.groupBy { name.lastName }.std() See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Type Conversion The following automatic type conversions are performed for the `mean` operation: diff --git a/docs/StardustDocs/topics/sum.md b/docs/StardustDocs/topics/sum.md index 45d86962c5..805024e80d 100644 --- a/docs/StardustDocs/topics/sum.md +++ b/docs/StardustDocs/topics/sum.md @@ -42,6 +42,8 @@ df.pivot { city }.groupBy { name.lastName }.sum() See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations. +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Type Conversion The following automatic type conversions are performed for the `sum` operation: diff --git a/docs/StardustDocs/topics/summaryStatistics.md b/docs/StardustDocs/topics/summaryStatistics.md index fdb852423c..4340e34191 100644 --- a/docs/StardustDocs/topics/summaryStatistics.md +++ b/docs/StardustDocs/topics/summaryStatistics.md @@ -54,6 +54,8 @@ When statistics `x` is applied to several columns, it can be computed in several * `xFor { columns }: DataRow` computes separate value per every given column * `xOf { rowExpression }: Value` computes single value across results of [row expression](DataRow.md#row-expressions) evaluated for every row +(See [column selectors](ColumnSelectors.md) for how to select the columns for these operations) + [min/max](minmax.md), [median](median.md), and [percentile](percentile.md) have additional mode `by`: * `minBy { rowExpression }: DataRow` finds a row with the minimal result of the [rowExpression](DataRow.md#row-expressions) * `medianBy { rowExpression }: DataRow` finds a row where the median lies based on the results of the [rowExpression](DataRow.md#row-expressions) @@ -73,7 +75,7 @@ df.sumOf { (weight ?: 0) / age } // sum of expression evaluated for every row ### groupBy statistics -When statistics is applied to [`GroupBy DataFrame`](groupBy.md#transformation), it is computed for every data group. +When statistics are applied to [`GroupBy DataFrame`](groupBy.md#transformation), it is computed for every data group. If a statistic is applied in a mode that returns a single value for every data group, it will be stored in a single column named according to the statistic name. diff --git a/docs/StardustDocs/topics/unfold.md b/docs/StardustDocs/topics/unfold.md index 161240cf29..21706e35fb 100644 --- a/docs/StardustDocs/topics/unfold.md +++ b/docs/StardustDocs/topics/unfold.md @@ -10,6 +10,8 @@ This operation is useful when: 1. you use a library API that gives you class instances 2. you do not want to or cannot annotate classes with [`@DataSchema`](schemas.md) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + ### Library API @@ -38,7 +40,7 @@ val initialData = interestingRepos -Using unfold you can convert `response` to a [`ColumnGroup`](DataColumn.md#columngroup) and use rich [modify](modify.md) capabilities. +Using unfold, you can convert `response` to a [`ColumnGroup`](DataColumn.md#columngroup) and use rich [modify](modify.md) capabilities. diff --git a/docs/StardustDocs/topics/ungroup.md b/docs/StardustDocs/topics/ungroup.md index c3704f9d60..40f36612f6 100644 --- a/docs/StardustDocs/topics/ungroup.md +++ b/docs/StardustDocs/topics/ungroup.md @@ -10,7 +10,7 @@ ungroup { columns } **Reverse operation:** [`group`](group.md) -See [column selectors](ColumnSelectors.md) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. diff --git a/docs/StardustDocs/topics/update.md b/docs/StardustDocs/topics/update.md index 8368df0691..3779a033b3 100644 --- a/docs/StardustDocs/topics/update.md +++ b/docs/StardustDocs/topics/update.md @@ -17,7 +17,8 @@ rowColExpression: (DataRow, DataColumn) -> NewValue frameExpression: DataFrame.(DataFrame) -> DataFrame ``` -See [column selectors](ColumnSelectors.md) and [row expressions](DataRow.md#row-expressions) +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation and +[row expressions](DataRow.md#row-expressions) for how to specify the new values. diff --git a/docs/StardustDocs/topics/valueCounts.md b/docs/StardustDocs/topics/valueCounts.md index 9c39a27dcf..aae0fc8b66 100644 --- a/docs/StardustDocs/topics/valueCounts.md +++ b/docs/StardustDocs/topics/valueCounts.md @@ -10,6 +10,8 @@ valueCounts(sort = true, ascending = false, dropNA = false) [ { columns } ] ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + **Parameters:** * `sort: Boolean = true` — sort by count * `ascending: Boolean = false` — sort in ascending order diff --git a/docs/StardustDocs/topics/values.md b/docs/StardustDocs/topics/values.md index b83bd63efb..c956877800 100644 --- a/docs/StardustDocs/topics/values.md +++ b/docs/StardustDocs/topics/values.md @@ -6,9 +6,11 @@ Return [`Sequence`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequence ``` values(byRows: Boolean = false) - [ columns ]: Sequence + [ { columns } ]: Sequence ``` +See [column selectors](ColumnSelectors.md) for how to select the columns for this operation. + **Parameters:** * `columns` (optional) — subset of columns for values extraction * `byRows: Boolean = false` — if `true`, data is traversed by rows, not by columns