Skip to content

Commit d8012ed

Browse files
Merge pull request #1246 from Kotlin/extension_properties_docs
Extension properties docs
2 parents 521840e + 8bc936f commit d8012ed

File tree

4 files changed

+174
-23
lines changed

4 files changed

+174
-23
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
name,info
2+
Alice,"{""age"":23,""height"":175.5}"
3+
Bob,"{""age"":27,""height"":160.2}"
Lines changed: 155 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,164 @@
11
[//]: # (title: Extension Properties API)
22

3-
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
4-
5-
Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md).
6-
They are generated based on a [dataframe schema](schemas.md),
3+
When working with a [`DataFrame`](DataFrame.md), the most convenient and reliable way
4+
to access its columns — including for operations and retrieving column values
5+
in row expressions — is through auto-generated extension properties.
6+
They are generated based on a [dataframe schema](schemas.md),
77
with the name and type of properties inferred from the name and type of the corresponding columns.
8+
It also works for all types of hierarchical dataframes.
9+
10+
> The behavior of data schema generation differs between the
11+
> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](gettingStartedKotlinNotebook.md).
12+
>
13+
> * In **Kotlin Notebook**, a schema is generated *only after cell execution* for
14+
> `DataFrame` variables defined within that cell.
15+
> * With the **Compiler Plugin**, a new schema is generated *after every operation*
16+
> — but support for all operations is still in progress.
17+
> Retrieving the schema for `DataFrame` read from a file or URL is *not yet supported* either.
18+
>
19+
> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
20+
{style="warning"}
21+
22+
## Example
23+
24+
Consider a simple hierarchical dataframe from
25+
<resource src="example.csv"></resource>.
26+
27+
This table consists of two columns: `name`, which is a `String` column, and `info`,
28+
which is a [**column group**](DataColumn.md#columngroup) containing two nested
29+
[value columns](DataColumn.md#valuecolumn)
30+
`age` of type `Int`, and `height` of type `Double`.
31+
32+
<table>
33+
<thead>
34+
<tr>
35+
<th>name</th>
36+
<th colspan="2">info</th>
37+
</tr>
38+
<tr>
39+
<th></th>
40+
<th>age</th>
41+
<th>height</th>
42+
</tr>
43+
</thead>
44+
<tbody>
45+
<tr>
46+
<td>Alice</td>
47+
<td>23</td>
48+
<td>175.5</td>
49+
</tr>
50+
<tr>
51+
<td>Bob</td>
52+
<td>27</td>
53+
<td>160.2</td>
54+
</tr>
55+
</tbody>
56+
</table>
57+
58+
<tabs>
59+
<tab title="Kotlin Notebook">
60+
Read the [`DataFrame`](DataFrame.md) from the CSV file:
61+
62+
```kotlin
63+
val df = DataFrame.readCsv("example.csv")
64+
```
65+
66+
*After cell execution* data schema and extensions for this `DataFrame` will be generated
67+
so you can use extensions for accessing columns,
68+
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
69+
and [DataRow API](DataRow.md):
70+
71+
72+
```kotlin
73+
// Get nested column
74+
df.info.age
75+
// Sort by multiple columns
76+
df.sortBy { name and info.height }
77+
// Filter rows using a row condition.
78+
// These extensions express the exact value in the row
79+
// with the corresponding type:
80+
df.filter { name.startsWith("A") && info.age >= 16 }
81+
```
82+
83+
If you change the dataframe's schema by changing any column [name](rename.md),
84+
or [type](convert.md) or [add](add.md) a new one, you need to
85+
run a cell with a new [`DataFrame`](DataFrame.md) declaration first.
86+
For example, rename the `name` column into "firstName":
87+
88+
```kotlin
89+
val dfRenamed = df.rename { name }.into("firstName")
90+
```
891

9-
Having these, it allows you to work with your dataframe like:
92+
After running the cell with the code above, you can use `firstName` extensions in the following cells:
93+
94+
```kotlin
95+
dfRenamed.firstName
96+
dfRenamed.rename { firstName }.into("name")
97+
dfRenamed.filter { firstName == "Nikita" }
98+
```
99+
100+
See the [](quickstart.md) in Kotlin Notebook with basic Extension Properties API examples.
101+
102+
</tab>
103+
<tab title="Compiler Plugin">
104+
105+
For now, if you read [`DataFrame`](DataFrame.md) from a file or URL, you need to define its schema manually.
106+
You can do it quickly with [`generate..()` methods](DataSchema-Data-Classes-Generation.md).
107+
108+
Define schemas:
10109
```kotlin
11-
val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>()
12-
val nameColumn /* : DataColumn<String> */ = peopleDf.name
13-
val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age
110+
@DataSchema
111+
data class PersonInfo(
112+
val age: Int,
113+
val height: Float
114+
)
115+
116+
@DataSchema
117+
data class Person(
118+
val info: PersonInfo,
119+
val name: String
120+
)
14121
```
15-
and of course
122+
123+
Read the `DataFrame` from the CSV file and specify the schema with
124+
[`.convertTo()`](convertTo.md) or [`cast()`](cast.md):
125+
126+
```kotlin
127+
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
128+
```
129+
130+
Extensions for this `DataFrame` will be generated automatically by the plugin,
131+
so you can use extensions for accessing columns,
132+
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
133+
and [DataRow API](DataRow.md).
134+
135+
136+
```kotlin
137+
// Get nested column
138+
df.info.age
139+
// Sort by multiple columns
140+
df.sortBy { name and info.height }
141+
// Filter rows using a row condition.
142+
// These extensions express the exact value in the row
143+
// with the corresponding type:
144+
df.filter { name.startsWith("A") && info.age >= 16 }
145+
```
146+
147+
Moreover, new extensions will be generated on-the-fly after each schema change:
148+
by changing any column [name](rename.md),
149+
or [type](convert.md) or [add](add.md) a new one.
150+
For example, rename the `name` column into "firstName" and then we can use `firstName` extensions
151+
in the following operations:
152+
16153
```kotlin
17-
peopleDf.add("lastName") { name.split(",").last() }
18-
.dropNulls { personData.age }
19-
.filter { survived && home.endsWith("NY") && personData.age in 10..20 }
154+
// Rename "name" column into "firstName"
155+
df.rename { name }.into("firstName")
156+
// Can use `firstName` extension in the row condition
157+
// right after renaming
158+
.filter { firstName == "Nikita" }
20159
```
21160

22-
To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md)
23-
or jump straight to [Data Schemas in Gradle projects](schemasGradle.md),
24-
or [Data Schemas in Jupyter notebooks](schemasJupyter.md).
161+
See [Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-example)
162+
IDEA project with basic Extension Properties API examples.
163+
</tab>
164+
</tabs>

docs/StardustDocs/topics/guides/Guides-And-Examples.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ Explore our structured, in-depth guides to steadily improve your Kotlin DataFram
2424

2525
<img src="quickstart_preview.png" border-effect="rounded" width="705"/>
2626

27+
* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
28+
and make working with your data both convenient and type-safe.
29+
2730
* [Enhanced Column Selection DSL](https://blog.jetbrains.com/kotlin/2024/07/enhanced-column-selection-dsl-in-kotlin-dataframe/)
2831
— explore powerful DSL for typesafe and flexible column selection in Kotlin DataFrame.
2932
* [](Kotlin-DataFrame-Features-in-Kotlin-Notebook.md)

docs/StardustDocs/topics/guides/quickstart.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,8 @@ columns.
8888
Column selectors are widely used across operations — one of the simplest examples is `.select { }`, which returns a new
8989
DataFrame with only the columns chosen in Columns Selection expression.
9090

91-
After executing the cell where a `DataFrame` variable is declared, an extension with properties for its columns is
92-
automatically generated.
91+
*After executing the cell* where a `DataFrame` variable is declared,
92+
[extension properties](extensionPropertiesApi.md) for its columns are automatically generated.
9393
These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
9494

9595
Select some columns:
@@ -104,18 +104,20 @@ dfSelected
104104

105105
<!---END-->
106106

107+
<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
108+
107109
> With a [Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enabled,
108110
> you can use auto-generated properties in your IntelliJ IDEA projects.
109111
110-
<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
112+
## Row Filtering
111113

112-
## Raw Filtering
113-
114-
Some operations use `RowExpression`, i.e., expression that applies for all `DataFrame` rows. For example `.filter { }`
115-
that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
114+
Some operations use the [DataRow API](DataRow.md), with expressions and conditions
115+
that apply for all `DataFrame` rows.
116+
For example, `.filter { }` that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
116117

117118
Inside a row expression, you can access the values of the current row by column names through auto-generated properties.
118-
Similar to the Columns Selection DSL, but in this case the properties represent actual values, not column references.
119+
Similar to the [Columns Selection DSL](ColumnSelectors.md),
120+
but in this case the properties represent actual values, not column references.
119121

120122
Filter rows by "stargazers_count" value:
121123

@@ -349,6 +351,9 @@ Ready to go deeper? Check out what’s next:
349351

350352
- 🧠 **Understand the design** and core concepts in the [library overview](overview.md).
351353

354+
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
355+
and make working with your data both convenient and type-safe.
356+
352357
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
353358
for auto-generated column access in your IntelliJ IDEA projects.
354359

0 commit comments

Comments
 (0)