Merge pull request #1246 from Kotlin/extension_properties_docs

AndreiKingsley · web-flow · commit d8012ed8ca0f · 2025-06-13T16:40:14.000+04:00
Extension properties docs
diff --git a/docs/StardustDocs/resources/example.csv b/docs/StardustDocs/resources/example.csv
@@ -0,0 +1,3 @@
+name,info
+Alice,"{""age"":23,""height"":175.5}"
+Bob,"{""age"":27,""height"":160.2}"
diff --git a/docs/StardustDocs/topics/extensionPropertiesApi.md b/docs/StardustDocs/topics/extensionPropertiesApi.md
@@ -1,24 +1,164 @@
 [//]: # (title: Extension Properties API)
 
-<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
-
-Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md).
-They are generated based on a [dataframe schema](schemas.md), 
+When working with a [`DataFrame`](DataFrame.md), the most convenient and reliable way 
+to access its columns — including for operations and retrieving column values 
+in row expressions — is through auto-generated extension properties.
+They are generated based on a [dataframe schema](schemas.md),
 with the name and type of properties inferred from the name and type of the corresponding columns.
+It also works for all types of hierarchical dataframes.
+
+> The behavior of data schema generation differs between the 
+> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](gettingStartedKotlinNotebook.md).
+>
+> * In **Kotlin Notebook**, a schema is generated *only after cell execution* for 
+> `DataFrame` variables defined within that cell.
+> * With the **Compiler Plugin**, a new schema is generated *after every operation*
+> — but support for all operations is still in progress. 
+> Retrieving the schema for `DataFrame` read from a file or URL is *not yet supported* either.
+>
+> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
+{style="warning"}
+
+## Example
+
+Consider a simple hierarchical dataframe from
+<resource src="example.csv"></resource>.
+
+This table consists of two columns: `name`, which is a `String` column, and `info`, 
+which is a [**column group**](DataColumn.md#columngroup) containing two nested 
+[value columns](DataColumn.md#valuecolumn) — 
+`age` of type `Int`, and `height` of type `Double`.
+
+<table>
+  <thead>
+    <tr>
+      <th>name</th>
+      <th colspan="2">info</th>
+    </tr>
+    <tr>
+      <th></th>
+      <th>age</th>
+      <th>height</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>Alice</td>
+      <td>23</td>
+      <td>175.5</td>
+    </tr>
+    <tr>
+      <td>Bob</td>
+      <td>27</td>
+      <td>160.2</td>
+    </tr>
+  </tbody>
+</table>
+
+<tabs>
+<tab title="Kotlin Notebook">
+Read the [`DataFrame`](DataFrame.md) from the CSV file:
+
+```kotlin
+val df = DataFrame.readCsv("example.csv")
+```
+
+*After cell execution* data schema and extensions for this `DataFrame` will be generated 
+so you can use extensions for accessing columns, 
+using it in operations inside the [Column Selector DSL](ColumnSelectors.md) 
+and [DataRow API](DataRow.md):
+
+
+```kotlin
+// Get nested column
+df.info.age
+// Sort by multiple columns
+df.sortBy { name and info.height }
+// Filter rows using a row condition. 
+// These extensions express the exact value in the row 
+// with the corresponding type:
+df.filter { name.startsWith("A") && info.age >= 16 }
+```
+
+If you change the dataframe's schema by changing any column [name](rename.md), 
+or [type](convert.md) or [add](add.md) a new one, you need to 
+run a cell with a new [`DataFrame`](DataFrame.md) declaration first. 
+For example, rename the `name` column into "firstName":
+
+```kotlin
+val dfRenamed = df.rename { name }.into("firstName")
+```
 
-Having these, it allows you to work with your dataframe like:
+After running the cell with the code above, you can use `firstName` extensions in the following cells:
+
+```kotlin
+dfRenamed.firstName
+dfRenamed.rename { firstName }.into("name")
+dfRenamed.filter { firstName == "Nikita" }
+```
+
+See the [](quickstart.md) in Kotlin Notebook with basic Extension Properties API examples.
+
+</tab>
+<tab title="Compiler Plugin">
+
+For now, if you read [`DataFrame`](DataFrame.md) from a file or URL, you need to define its schema manually. 
+You can do it quickly with [`generate..()` methods](DataSchema-Data-Classes-Generation.md).
+
+Define schemas:
 ```kotlin
-val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>()
-val nameColumn /* : DataColumn<String> */ = peopleDf.name
-val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age
+@DataSchema
+data class PersonInfo(
+    val age: Int,
+    val height: Float
+)
+
+@DataSchema
+data class Person(
+    val info: PersonInfo,
+    val name: String
+)
 ```
-and of course
+
+Read the `DataFrame` from the CSV file and specify the schema with 
+[`.convertTo()`](convertTo.md) or [`cast()`](cast.md):
+
+```kotlin
+val df = DataFrame.readCsv("example.csv").convertTo<Person>()
+```
+
+Extensions for this `DataFrame` will be generated automatically by the plugin, 
+so you can use extensions for accessing columns, 
+using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
+and [DataRow API](DataRow.md).
+
+
+```kotlin
+// Get nested column
+df.info.age
+// Sort by multiple columns
+df.sortBy { name and info.height }
+// Filter rows using a row condition. 
+// These extensions express the exact value in the row 
+// with the corresponding type:
+df.filter { name.startsWith("A") && info.age >= 16 }
+```
+
+Moreover, new extensions will be generated on-the-fly after each schema change: 
+by changing any column [name](rename.md),
+or [type](convert.md) or [add](add.md) a new one.
+For example, rename the `name` column into "firstName" and then we can use `firstName` extensions
+in the following operations:
+
 ```kotlin
-peopleDf.add("lastName") { name.split(",").last() }
-    .dropNulls { personData.age }
-    .filter { survived && home.endsWith("NY") && personData.age in 10..20 }
+// Rename "name" column into "firstName"
+df.rename { name }.into("firstName")
+    // Can use `firstName` extension in the row condition 
+    // right after renaming
+    .filter { firstName == "Nikita" }
 ```
 
-To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md)
-or jump straight to [Data Schemas in Gradle projects](schemasGradle.md), 
-or [Data Schemas in Jupyter notebooks](schemasJupyter.md).
+See [Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-example) 
+IDEA project with basic Extension Properties API examples.
+</tab>
+</tabs>
diff --git a/docs/StardustDocs/topics/guides/Guides-And-Examples.md b/docs/StardustDocs/topics/guides/Guides-And-Examples.md
@@ -24,6 +24,9 @@ Explore our structured, in-depth guides to steadily improve your Kotlin DataFram
 
 <img src="quickstart_preview.png" border-effect="rounded" width="705"/>
 
+* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md) 
+and make working with your data both convenient and type-safe.
+
 * [Enhanced Column Selection DSL](https://blog.jetbrains.com/kotlin/2024/07/enhanced-column-selection-dsl-in-kotlin-dataframe/)
   — explore powerful DSL for typesafe and flexible column selection in Kotlin DataFrame.
 * [](Kotlin-DataFrame-Features-in-Kotlin-Notebook.md)
diff --git a/docs/StardustDocs/topics/guides/quickstart.md b/docs/StardustDocs/topics/guides/quickstart.md
@@ -88,8 +88,8 @@ columns.
 Column selectors are widely used across operations — one of the simplest examples is `.select { }`, which returns a new
 DataFrame with only the columns chosen in Columns Selection expression.
 
-After executing the cell where a `DataFrame` variable is declared, an extension with properties for its columns is
-automatically generated.
+*After executing the cell* where a `DataFrame` variable is declared, 
+[extension properties](extensionPropertiesApi.md) for its columns are automatically generated.
 These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
 
 Select some columns:
@@ -104,18 +104,20 @@ dfSelected
 
 <!---END-->
 
+<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
+
 > With a [Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enabled,
 > you can use auto-generated properties in your IntelliJ IDEA projects.
 
-<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
+## Row Filtering
 
-## Raw Filtering
-
-Some operations use `RowExpression`, i.e., expression that applies for all `DataFrame` rows. For example `.filter { }`
-that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
+Some operations use the [DataRow API](DataRow.md), with expressions and conditions 
+that apply for all `DataFrame` rows.
+For example, `.filter { }` that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
 
 Inside a row expression, you can access the values of the current row by column names through auto-generated properties.
-Similar to the Columns Selection DSL, but in this case the properties represent actual values, not column references.
+Similar to the [Columns Selection DSL](ColumnSelectors.md),
+but in this case the properties represent actual values, not column references.
 
 Filter rows by "stargazers_count" value:
 
@@ -349,6 +351,9 @@ Ready to go deeper? Check out what’s next:
 
 - 🧠 **Understand the design** and core concepts in the [library overview](overview.md).
 
+- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**  
+  and make working with your data both convenient and type-safe.
+
 - 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**  
   for auto-generated column access in your IntelliJ IDEA projects.
 

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+name,info`
	`2`	`+Alice,"{""age"":23,""height"":175.5}"`
	`3`	`+Bob,"{""age"":27,""height"":160.2}"`