ydb-platform
diff --git a/‎ydb/docs/en/core/concepts/_includes/vector_indexes.md
Lines changed: 118 additions & 0 deletions b/‎ydb/docs/en/core/concepts/_includes/vector_indexes.md
Lines changed: 118 additions & 0 deletions
diff --git a/‎ydb/docs/en/core/concepts/column-table.md
Lines changed: 1 addition & 0 deletions b/‎ydb/docs/en/core/concepts/column-table.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎ydb/docs/en/core/concepts/datamodel/_includes/table.md
Lines changed: 1 addition & 0 deletions b/‎ydb/docs/en/core/concepts/datamodel/_includes/table.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎ydb/docs/en/core/concepts/glossary.md
Lines changed: 4 additions & 0 deletions b/‎ydb/docs/en/core/concepts/glossary.md
Lines changed: 4 additions & 0 deletions
diff --git a/‎ydb/docs/en/core/concepts/toc_i.yaml
Lines changed: 2 additions & 0 deletions b/‎ydb/docs/en/core/concepts/toc_i.yaml
Lines changed: 2 additions & 0 deletions
diff --git a/‎ydb/docs/en/core/concepts/vector_indexes.md
Lines changed: 1 addition & 0 deletions b/‎ydb/docs/en/core/concepts/vector_indexes.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎ydb/docs/en/core/dev/toc_p.yaml
Lines changed: 2 additions & 0 deletions b/‎ydb/docs/en/core/dev/toc_p.yaml
Lines changed: 2 additions & 0 deletions
diff --git a/‎ydb/docs/en/core/dev/vector-indexes.md
Lines changed: 62 additions & 0 deletions b/‎ydb/docs/en/core/dev/vector-indexes.md
Lines changed: 62 additions & 0 deletions
diff --git a/‎ydb/docs/en/core/reference/observability/metrics/index.md
Lines changed: 1 addition & 1 deletion b/‎ydb/docs/en/core/reference/observability/metrics/index.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎ydb/docs/en/core/reference/ydb-cli/commands/_includes/secondary_index.md
Lines changed: 2 additions & 2 deletions b/‎ydb/docs/en/core/reference/ydb-cli/commands/_includes/secondary_index.md
Lines changed: 2 additions & 2 deletions
@@ -0,0 +1,118 @@
+# Vector indexes
+
+{{ ydb-short-name }} supports [vector indexes](https://en.wikipedia.org/wiki/Vector_database) to efficiently find the top k rows with vector values closest to a query vector. Unlike secondary indexes that optimize equality or range queries, vector indexes enable similarity search based on distance or similarity functions.
+
+Vector indexes are particularly useful for:
+
+* recommendation systems (finding similar items/users)
+* semantic search (matching text embeddings)
+* image similarity search
+* anomaly detection (finding outliers)
+* classification systems (finding nearest labeled examples)
+
+## Vector index characteristics {#characteristics}
+
+Vector indexes in {{ ydb-short-name }}:
+
+* Solve nearest neighbor search problems using similarity or distance functions
+* Support multiple distance/similarity functions: "inner_product", "cosine" similarity and "cosine", "euclidean", "manhattan" distance
+* Currently implement a single index type: `vector_kmeans_tree`
+
+### Vector index `vector_kmeans_tree` type {#vector-kmeans-tree-type}
+
+The `vector_kmeans_tree` index implements a hierarchical clustering structure. Its organization includes:
+
+1. Hierarchical clustering:
+
+    - The index builds multiple levels of k-means clusters
+    - At each level, vectors are partitioned into specified number of clusters in power of level
+    - First level clusters the entire dataset
+    - Subsequent levels recursively cluster each parent cluster's contents
+
+2. Search process:
+
+    - During queries, the index examines only the most promising clusters
+    - This search space pruning avoids exhaustive search through all vectors
+
+3. Parameters:
+
+    - `levels`: The number of tree levels (typically 1-3). Controls search depth
+    - `clusters`: The number of clusters on each level (typically 64-512). Determines search breadth at each level 
+
+## Vector index types {#types}
+
+### Basic vector index {#basic}
+
+The simplest form that indexes vectors without additional filtering capabilities. For example:
+
+```yql
+ALTER TABLE my_table
+  ADD INDEX my_index
+  GLOBAL USING vector_kmeans_tree
+  ON (embedding)
+  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
+```
+
+### Vector index with covered columns {#covering}
+
+Includes additional columns to avoid reading from the main table during queries:
+
+```yql
+ALTER TABLE my_table
+  ADD INDEX my_index
+  GLOBAL USING vector_kmeans_tree
+  ON (embedding) COVER (data)
+  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
+```
+
+### Prefixed vector index {#prefixed}
+
+Allows filtering by prefix columns before performing vector search:
+
+```yql
+ALTER TABLE my_table
+  ADD INDEX my_index
+  GLOBAL USING vector_kmeans_tree
+  ON (user, embedding)
+  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
+```
+
+### Prefixed vector index with covered columns {#prefixed-covering}
+
+Combines prefix filtering with covered columns for optimal performance:
+
+```yql
+ALTER TABLE my_table
+  ADD INDEX my_index
+  GLOBAL USING vector_kmeans_tree
+  ON (user, embedding) COVER (data)
+  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
+```
+
+## Creating vector indexes {#creation}
+
+Vector indexes can be created:
+
+* When creating a table with the YQL [`CREATE TABLE` statement](../../yql/reference/syntax/create_table/vector_index.md)
+* Added to an existing table with the YQL [`ALTER TABLE` statement](../../yql/reference/syntax/alter_table/indexes.md)
+
+For more information about vector index parameters, see [`CREATE TABLE` statement](../../yql/reference/syntax/create_table/vector_index.md).
+
+## Using vector indexes {#usage}
+
+Query vector indexes using the VIEW syntax in YQL. For prefixed indexes, include the prefix columns in the WHERE clause:
+
+```yql
+SELECT user, data
+FROM my_table VIEW my_index
+WHERE user = "..."
+ORDER BY Knn::CosineSimilarity(embedding, ...) DESC
+LIMIT 10;
+```
+
+
+## Limitations {#limitations}
+
+Currently not supported:
+* modifying rows in indexed tables
+* bit vector type
@@ -22,6 +22,7 @@ What's currently not supported:
 
 * Reading data from replicas
 * Secondary indexes
+* Vector indexes
 * Bloom filters
 * Change Data Capture
 * Renaming tables
 
@@ -197,6 +197,7 @@ At the moment, not all functionality of column-oriented tables is implemented. T
 
 * Reading from replicas.
 * Secondary indexes.
+* Vector indexes.
 * Bloom filters.
 * Change Data Capture.
 * Table renaming.
 
@@ -157,6 +157,10 @@ A **primary index** or **primary key index** is the main data structure used to
 
 A **secondary index** is an additional data structure used to locate rows in a table, typically when it can't be done efficiently using the [primary index](#primary-index). Unlike the primary index, secondary indexes are managed independently from the main table data. Thus, a table might have multiple secondary indexes for different use cases. {{ ydb-short-name }}'s capabilities in terms of secondary indexes are covered in a separate article [{#T}](secondary_indexes.md). Secondary indexes can be either unique or non-unique.
 
+#### Vector Index {#vector-index}
+
+A **vector index** is an additional data structure used to speed up the [nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search), typically when the data is too large for the [index-less approach](../yql/reference/udf/list/knn.md) to handle the load. Unlike the primary index, vector indexes are managed independently of the underlying table data. Thus, a table can have multiple vector indexes for different scenarios. For more information about using vector indexes in {{ ydb-short-name }}, see [{#T}](vector_indexes.md).
+
 #### Column family {#column-family}
 
 A **column family** or **column group** is a feature that allows storing a subset of [row-oriented table](#row-oriented-table) columns separately in a distinct family or group. The primary use case is to store some columns on different kinds of disk drives (offload less important columns to HDD) or with various compression settings. If the workload requires many column families, consider using [column-oriented tables](#column-oriented-table) instead.
 
@@ -14,6 +14,8 @@ items:
   href: transactions.md
 - name: Secondary indexes
   href: secondary_indexes.md
+- name: Vector indexes
+  href: vector_indexes.md
 - name: Change Data Capture (CDC)
   href: cdc.md
   when: feature_changefeed
 
@@ -0,0 +1 @@
+{% include [vector_indexes.md](_includes/vector_indexes.md) %}
@@ -18,6 +18,8 @@ items:
     path: primary-key/toc_p.yaml
 - name: Secondary indexes
   href: secondary-indexes.md
+- name: Vector indexes
+  href: vector-indexes.md
 - name: Query plans optimization
   href: query-plans-optimization.md
 - name: Batch upload
 
@@ -0,0 +1,62 @@
+# Vector indexes
+
+[Vector indexes](https://en.wikipedia.org/wiki/Vector_database) are specialized data structures that enable efficient similarity search in high-dimensional spaces. Unlike traditional indexes that optimize exact lookups, vector indexes allow finding the most similar items to a query vector based on mathematical distance or similarity measures.
+
+Data in a {{ ydb-short-name }} table is stored and sorted by a primary key, enabling efficient point lookups and range scans. Vector indexes provide similar efficiency for nearest neighbor searches in vector spaces, which is particularly valuable for working with embeddings and other high-dimensional data representations.
+
+This article describes practical operations with vector indexes. For conceptual information about vector index types and their characteristics, see [Vector indexes](../concepts/vector_indexes.md) in the Concepts section.
+
+## Creating vector indexes {#create}
+
+A vector index can be created with the following YQL commands:
+* [`CREATE TABLE`](../yql/reference/syntax/create_table/index.md)
+* [`ALTER TABLE`](../yql/reference/syntax/alter_table/index.md)
+
+Example of creating a prefixed vector index with covered columns:
+
+```yql
+ALTER TABLE my_table
+  ADD INDEX my_index
+  GLOBAL USING vector_kmeans_tree
+  ON (user, embedding) COVER (data)
+  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
+```
+
+Key parameters for `vector_kmeans_tree`:
+* `distance`/`similarity`: Metric function ("cosine", "euclidean", etc.)
+* `type`: Data type ("float", "int8", "uint8")
+* `dimension`: Number of dimensions (<= 16384)
+* `levels`: Tree depth
+* `clusters`: Number of clusters per level (values > 1000 may impact performance)
+
+Since building a vector index requires processing existing data, index creation on populated tables may take significant time. This operation runs in the background, allowing continued table access during construction. The index becomes available automatically when ready.
+
+## Using vector indexes for similarity search {#use}
+
+To perform similarity searches, explicitly specify the index name in the VIEW clause. For prefixed indexes, include prefix column conditions in the WHERE clause:
+
+```yql
+DECLARE $query_vector AS List<Uint8>;
+
+SELECT user, data
+FROM my_table VIEW my_index
+WHERE user = "john_doe"
+ORDER BY Knn::CosineSimilarity(embedding, $query_vector) DESC
+LIMIT 10;
+```
+
+Without the VIEW clause, the query would perform a full table scan with brute-force vector comparison.
+
+## Checking the cost of queries {#cost}
+
+Any query made in a transactional application should be checked in terms of the number of I/O operations it performed in the database and how much CPU was used to run it. You should also make sure these indicators don't continuously grow as the database volume grows. {{ ydb-short-name }} returns statistics required for the analysis after running each query.
+
+If you use the {{ ydb-short-name }} CLI, select the `--stats` option to enable printing statistics after executing the `yql` command. All {{ ydb-short-name }} SDKs also contain structures with statistics returned after running a query. If you make a query in the UI, you'll see a tab with statistics next to the results tab.
+
+{% note warning %}
+
+Vector indexes currently don't support data modification operations. 
+Any attempt to modify rows in indexed tables will fail. 
+This limitation will be removed in future releases.
+
+{% endnote %}
@@ -5,7 +5,7 @@
 | Metric name<br/>Type, units of measurement | Description<br/>Labels |
 | ----- | ----- |
 | `resources.storage.used_bytes`<br/>`IGAUGE`, bytes | The size of user and service data stored in distributed network storage. `resources.storage.used_bytes` = `resources.storage.table.used_bytes` + `resources.storage.topic.used_bytes`. |
-| `resources.storage.table.used_bytes`<br/>`IGAUGE`, bytes | The size of user and service data stored by tables in distributed network storage. Service data includes the data of the primary and [secondary indexes](../../../concepts/secondary_indexes.md). |
+| `resources.storage.table.used_bytes`<br/>`IGAUGE`, bytes | The size of user and service data stored by tables in distributed network storage. Service data includes the data of the primary, [secondary indexes](../../../concepts/secondary_indexes.md) and [vector indexes](../../../concepts/vector_indexes.md). |
 | `resources.storage.topic.used_bytes`<br/>`IGAUGE`, bytes | The size of storage used by topics. This metric sums the `topic.storage_bytes` values of all topics. |
 | `resources.storage.limit_bytes`<br/>`IGAUGE`, bytes | A limit on the size of user and service data that a database can store in distributed network storage. |
 
 
@@ -89,9 +89,9 @@ Deleting the index-building details (use the actual operation id):
 {{ ydb-cli }} -p quickstart operation forget ydb://buildindex/7?id=2814749869
 ```
 
-## Deleting a secondary index {#drop}
+## Deleting a index {#drop}
 
-Secondary indexes are deleted by the `table index drop` command:
+Indexes are deleted by the `table index drop` command:
 
 ```bash
 {{ ydb-cli }} [connection options] table index drop <table> --index-name STR
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+{% include [vector_indexes.md](_includes/vector_indexes.md) %}`