Skip to content

Commit 7817357

Browse files
committed
small docs and readme updates regarding main concepts
1 parent 05fd49e commit 7817357

File tree

3 files changed

+24
-20
lines changed

3 files changed

+24
-20
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,16 @@
1111
Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and REPL.
1212

1313
* **Hierarchical** — represents hierarchical data structures, such as JSON or a tree of JVM objects.
14-
* **Functional** — data processing pipeline is organized in a chain of `DataFrame` transformation operations. Every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's possible.
14+
* **Functional** — the data processing pipeline is organized in a chain of `DataFrame` transformation operations.
15+
* **Immutable** — every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's possible.
1516
* **Readable** — data transformation operations are defined in DSL close to natural language.
1617
* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
1718
* **Minimalistic** — simple, yet powerful data model of three column kinds.
18-
* **Interoperable** — convertable with Kotlin data classes and collections.
19+
* **Interoperable** — convertable with Kotlin data classes and collections. This also means conversion to/from other libraries' data structures is usually quite straightforward!
1920
* **Generic** — can store objects of any type, not only numbers or strings.
2021
* **Typesafe** — on-the-fly generation of extension properties for type safe data access with Kotlin-style care for null safety.
2122
* **Polymorphic** — type compatibility derives from column schema compatibility. You can define a function that requires a special subset of columns in a dataframe but doesn't care about other columns.
23+
In notebooks this works out-of-the-box. In ordinary projects this requires casting (for now).
2224

2325
Integrates with [Kotlin kernel for Jupyter](https://github.com/Kotlin/kotlin-jupyter). Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections and [pandas](https://pandas.pydata.org/)
2426

docs/StardustDocs/topics/overview.md

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -36,30 +36,32 @@ The goal of data wrangling is to assure quality and useful data.
3636

3737
## Main Features and Concepts
3838

39-
* [**Hierarchical**](hierarchical.md) — the Kotlin DataFrame library provides an ability to read and present data from different sources including not only plain **CSV** but also **JSON** or **[SQL databases](readSqlDatabases.md)**.
40-
That’s why it has been designed hierarchical and allows nesting of columns and cells.
41-
42-
* [**Interoperable**](collectionsInterop.md) — hierarchical data layout also opens a possibility of converting any objects
43-
structure in application memory to a data frame and vice versa.
44-
45-
* **Safe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md)
39+
* [**Hierarchical**](hierarchical.md) — the Kotlin DataFrame library provides an ability to read and present data from different sources,
40+
including not only plain **CSV** but also **JSON** or **[SQL databases](readSqlDatabases.md)**.
41+
This is why it was designed to be hierarchical and allows nesting of columns and cells.
42+
* **Functional** — the data processing pipeline is organized in a chain of [`DataFrame`](DataFrame.md) transformation operations.
43+
* **Immutable** — every operation returns a new instance of [`DataFrame`](DataFrame.md) reusing underlying storage wherever it's possible.
44+
* **Readable** — data transformation operations are defined in DSL close to natural language.
45+
* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
46+
* **Minimalistic** — simple, yet powerful data model of three [column kinds](DataColumn.md#column-kinds).
47+
* [**Interoperable**](collectionsInterop.md) — convertable with Kotlin data classes and collections.
48+
This also means conversion to/from other libraries' data structures is usually quite straightforward!
49+
See our examples for some conversions between DataFrame and [Apache Spark](TODO), [Multik](TODO), and [JetBrains Exposed](TODO).
50+
* **Generic** — can store objects of any type, not only numbers or strings.
51+
* **Typesafe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md)
4652
that correspond to the columns of a data frame.
4753
In interactive notebooks like Jupyter or Datalore, the generation runs after each cell execution.
4854
In IntelliJ IDEA there's a Gradle plugin for generation properties based on CSV file or JSON file.
4955
Also, we’re working on a compiler plugin that infers and transforms [`DataFrame`](DataFrame.md) schema while typing.
5056
You can now clone this [project with many examples](https://github.com/koperagen/df-plugin-demo) showcasing how it allows you to reliably use our most convenient extension properties API.
5157
The generated properties ensure you’ll never misspell column name and don’t mess up with its type, and of course nullability is also preserved.
52-
53-
* **Generic** — columns can store objects of any type, not only numbers or strings.
54-
5558
* [**Polymorphic**](schemas.md)
56-
if all columns of [`DataFrame`](DataFrame.md) are presented in some other dataframes,
57-
then the first one could be a superclass for latter.
58-
Thus,
59-
one can define a function on an interface with some set of columns
60-
and then execute it in a safe way on any [`DataFrame`](DataFrame.md) which contains this set of columns.
61-
62-
* **Immutable** — all operations on [`DataFrame`](DataFrame.md) produce new instance, while underlying data is reused wherever it's possible
59+
if all columns of a [`DataFrame`](DataFrame.md) instance are presented in another dataframe,
60+
then the first one will be seen as a superclass for the latter.
61+
This means you can define a function on an interface with some set of columns
62+
and then execute it safely on any [`DataFrame`](DataFrame.md) which contains this same set of columns.
63+
In notebooks, this works out-of-the-box.
64+
In ordinary projects, this requires casting (for now).
6365

6466
## Syntax
6567

docs/StardustDocs/topics/schemasInheritance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ New schema interface for `filtered` variable will be derived from previously gen
1818
interface DataFrameType1 : DataFrameType
1919
```
2020

21-
Extension properties for data access are generated only for new and overriden members of `DataFrameType1` interface:
21+
Extension properties for data access are generated only for new and overridden members of `DataFrameType1` interface:
2222

2323
```kotlin
2424
val ColumnsContainer<DataFrameType1>.age: DataColumn<Int> get() = this["age"] as DataColumn<Int>

0 commit comments

Comments
 (0)