Skip to content

Commit 165006f

Browse files
committed
pg_vector usages
1 parent a7da7dc commit 165006f

File tree

7 files changed

+266
-15
lines changed

7 files changed

+266
-15
lines changed

CN/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
** xref:v3.2/10.adoc[迁移指南]
1414
* IvorySQL生态
1515
** xref:v3.2/11.adoc[PostGIS]
16+
** xref:v3.2/12.adoc[pgvector]
1617
* 功能列表
1718
** xref:v3.2/14.adoc[1、框架设计]
1819
** xref:v3.2/15.adoc[2、GUC框架]

CN/modules/ROOT/pages/v3.2/12.adoc

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
:sectnums:
2+
:sectnumlevels: 5
3+
4+
= pgvector
5+
6+
== 概述
7+
向量数据库是生成式人工智能(GenAI)的关键组成部分。pgvector作为PostgreSQL的重要扩展,不仅能够支持高达16000维度的向量计算,还提供了强大的向量操作和索引功能,使得PostgreSQL能够直接转化为高效的向量数据库。由于IvorySQL基于PostgreSQL研发,这使得它具备了与pgvector扩展无缝集成的能力,从而为用户提供了更广泛的数据处理和分析选项。在Oracle兼容模式下,pgvector扩展同样可用,这为Oracle用户使用向量数据库提供了极大的便利,使其能够轻松地迁移和管理数据,实现更高效的业务操作。
8+
9+
10+
== 原理介绍
11+
12+
IVFFLAT和HNSW是PGVector的两个索引算法
13+
14+
=== IVFFLAT
15+
16+
IVFFLAT的工作原理是将相似的向量聚类为区域,并建立一个倒排索引,将每个区域映射到其向量。这使得查询可以集中在数据的一个子集上,从而实现快速搜索。通过调整列表和探针参数,ivfflat 可以平衡数据集的速度和准确性,使 PostgreSQL 有能力对复杂数据进行快速的语义相似性搜索。通过简单的查询,应用程序可以在数百万个高维向量中找到与查询向量最近的邻居。对于自然语言处理、信息检索等,ivfflat 是一个比较好的解决方案
17+
在建立 ivfflat 索引时,你需要决定索引中包含多少个 list。每个 list 代表一个 "中心";这些中心通过 k-means 算法计算而来。一旦确定了所有中心,ivfflat 就会确定每个向量最靠近哪个中心,并将其添加到索引中。当需要查询向量数据时,你可以决定要检查多少个中心,这由 ivfflat.probes 参数决定。这就是 ANN 性能/召回率的结果:访问的中心越多,结果就越精确,但这是以牺牲性能为代价的。
18+
19+
=== HNSW
20+
21+
HNSW (Hierarchical Navigating Small World) 是一种基于图的索引算法,它由多层的邻近图组成,因此称为分层的 NSW 方法。它会为一张图按规则建成多层导航图,并让越上层的图越稀疏,结点间的距离越远;越下层的图越稠密,结点间的距离越近。HNSW 算法是一种经典的空间换时间的算法,它的搜索质量和搜索速度都比较高,但是它的内存开销也比较大,因为不仅需要将所有的向量都存储在内存中。还需要维护一个图的结构,也同样需要存储。
22+
23+
== 安装
24+
[TIP]
25+
====
26+
环境中已经安装了IvorySQL3.0及以上版本,安装路径为/usr/local/ivorysql/ivorysql-3
27+
====
28+
29+
=== 源码安装
30+
31+
** 设置PG_CONFIG环境变量
32+
```
33+
export PG_CONFIG=/usr/local/ivorysql/ivorysql-3/bin/pg_config
34+
```
35+
36+
** 拉取pg_vector源码
37+
```
38+
git clone --branch v0.6.2 https://github.com/pgvector/pgvector.git
39+
```
40+
41+
** 安装 pgvector
42+
```
43+
cd pgvector
44+
45+
sudo --preserve-env=PG_CONFIG make
46+
sudo --preserve-env=PG_CONFIG make install
47+
```
48+
49+
** 创建pgvector扩展
50+
```
51+
[ivorysql@localhost ivorysql-3]$ psql
52+
psql (16.2)
53+
Type "help" for help.
54+
55+
ivorysql=# create extension vector;
56+
CREATE EXTENSION
57+
```
58+
至此,pgvector扩展安装已完成。
59+
更多用例,请参考 https://github.com/pgvector/pgvector?tab=readme-ov-file#getting-started[pgvector文档]
60+
61+
== Oracle兼容性
62+
63+
在IvorySQL Oracle兼容模式下,pgvector扩展同样可以正确运行
64+
[TIP]
65+
建议用户使用1521端口进行测试, psql -p 1521
66+
67+
=== 数据类型
68+
69+
```
70+
ivorysql=# CREATE TABLE items5 (id bigserial PRIMARY KEY, name varchar2(20), num number(20), embedding bit(3));
71+
CREATE TABLE
72+
ivorysql=# INSERT INTO items5 (name, num, embedding) VALUES ('1st oracle data',0, '000'), ('2nd oracle data', 111, '111');
73+
INSERT 0 2
74+
ivorysql=# SELECT * FROM items5 ORDER BY bit_count(embedding # '101') LIMIT 5;
75+
id | name | num | embedding
76+
----+-----------------+-----+-----------
77+
2 | 2nd oracle data | 111 | 111
78+
1 | 1st oracle data | 0 | 000
79+
```
80+
81+
=== 匿名块
82+
83+
```
84+
ivorysql=# declare
85+
i vector(3) := '[1,2,3]';
86+
begin
87+
raise notice '%', i;
88+
end;
89+
ivorysql-# /
90+
NOTICE: [1,2,3]
91+
DO
92+
```
93+
94+
=== 存储过程(PROCEDURE)
95+
```
96+
ivorysql=# CREATE OR REPLACE PROCEDURE ora_procedure()
97+
AS
98+
p vector(3) := '[4,5,6]';
99+
begin
100+
raise notice '%', p;
101+
end;
102+
/
103+
CREATE PROCEDURE
104+
ivorysql=# call ora_procedure();
105+
NOTICE: [4,5,6]
106+
CALL
107+
```
108+
109+
==== 函数(FUNCTION)
110+
```
111+
ivorysql=# CREATE OR REPLACE FUNCTION AddVector(a vector(3), b vector(3))
112+
RETURN vector(3)
113+
IS
114+
BEGIN
115+
RETURN a + b;
116+
END;
117+
/
118+
CREATE FUNCTION
119+
ivorysql=# SELECT AddVector('[1,2,3]','[4,5,6]') FROM DUAL;
120+
addvector
121+
----------------
122+
[5,7,9]
123+
(1 row)
124+
```

CN/modules/ROOT/pages/v3.2/2.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,4 +77,5 @@ IvorySQL是一个功能强大的开源对象关系数据库管理系统(ORDBMS)
7777
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/19[兼容Oracle匿名块]
7878
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/20[兼容Oracle函数与存储过程]
7979
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/21[内置数据类型与内置函数]
80-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/22[新增Oracle兼容模式的端口与ip]
80+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/22[新增Oracle兼容模式的端口与IP]
81+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/26[XML函数]

CN/modules/ROOT/pages/v3.2/22.adoc

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33

44
:imagesdir: ./_images
55

6-
= 新增Oracle兼容模式的端口与Ip
6+
= 新增Oracle兼容模式的端口与IP
77

88
== 目的
99

10-
- 为了将Oracle端口、Ip与PG的端口Ip进行区分。现需要增加对ORAPORT和ORAHOST的处理;
10+
- 为了将Oracle端口、IP与PG的端口IP进行区分。现需要增加对ORAPORT和ORAHOST的处理;
1111

1212
== 功能
1313

@@ -23,6 +23,3 @@
2323

2424
./pg_ctl -o "-p 5433 -o 1522" -D ../data
2525
```
26-
27-
28-

EN/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
** xref:v3.2/10.adoc[Migration]
1414
* IvorySQL Ecosystem
1515
** xref:v3.2/11.adoc[PostGIS]
16+
** xref:v3.2/12.adoc[pgvector]
1617
* List of features
1718
** xref:v3.2/14.adoc[1、Ivorysql frame design]
1819
** xref:v3.2/15.adoc[2、GUC Framework]

EN/modules/ROOT/pages/v3.2/12.adoc

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
:sectnums:
2+
:sectnumlevels: 5
3+
4+
= pgvector
5+
6+
== Overview
7+
8+
The vector database is an important component of Generative Artificial Intelligence (GenAI). As a significant extension of PostgreSQL, pgvector not only supports vector calculations of up to 16000 dimensions but also provides powerful vector operations and indexing capabilities, enabling PostgreSQL to directly transform into an efficient vector database. IvorySQL, being developed based on PostgreSQL, inherits the seamless integration capability with pgvector extension, thereby offering users a wider range of data processing and analysis options. Additionally, in Oracle compatibility mode, the pgvector extension is also available, providing great convenience for Oracle users to use vector databases, allowing for easy migration and management of data and achieving more efficient business operations.
9+
10+
11+
== Principles
12+
13+
PGVector has two indexing algorithms, IVFFLAT and HNSW.
14+
15+
=== IVFFLAT
16+
17+
The working principle of IVFFLAT is to cluster similar vectors into regions and build an inverted index mapping each region to its vectors. This allows queries to focus on a subset of the data, enabling fast searches. By adjusting the parameters of lists and probes, IVFFLAT can balance the speed and accuracy of the dataset, enabling PostgreSQL to perform rapid semantic similarity searches on complex data. Through simple queries, applications can find the nearest neighbors to a query vector among millions of high-dimensional vectors. For tasks such as natural language processing and information retrieval, IVFFLAT provides an effective solution.
18+
19+
When building an IVFFLAT index, you need to decide how many lists to include in the index. Each list represents a "center" which are computed using the k-means algorithm. Once all centers are determined, IVFFLAT determines which center each vector is closest to and adds it to the index. When querying vector data, you can decide how many centers to check, which is determined by the ivfflat.probes parameter. This results in a trade-off between ANN performance/recall: the more centers accessed, the more accurate the results, but at the expense of performance.
20+
21+
=== HNSW
22+
23+
24+
HNSW (Hierarchical Navigating Small World) is a graph-based indexing algorithm consisting of multiple layers of neighborhood graphs, hence the name "hierarchical" NSW method. It constructs multiple layers of navigation graphs for a given graph according to certain rules, with the upper layers of the graph being sparser and the distances between nodes farther apart; and the lower layers of the graph being denser and the distances between nodes closer together. HNSW algorithm is a classic trade-off between space and time, as it achieves high search quality and speed, but at the cost of significant memory overhead. This is because it not only requires storing all vectors in memory but also maintaining the structure of the graph, which also needs to be stored.
25+
26+
== Installation
27+
[TIP]
28+
====
29+
The IvorySQL 3.0(above version) has been installed in the environment, and the installation path is /usr/local/ivorysql/ivorysql-3
30+
====
31+
32+
=== Source Code Installation
33+
34+
** Setting PG_CONFIG
35+
```
36+
export PG_CONFIG=/usr/local/ivorysql/ivorysql-3/bin/pg_config
37+
```
38+
39+
** Pull pg_vector source code
40+
```
41+
git clone --branch v0.6.2 https://github.com/pgvector/pgvector.git
42+
```
43+
44+
** Install pgvector
45+
```
46+
cd pgvector
47+
48+
sudo --preserve-env=PG_CONFIG make
49+
sudo --preserve-env=PG_CONFIG make install
50+
```
51+
52+
** Create pgvector extension
53+
```
54+
[ivorysql@localhost ivorysql-3]$ psql
55+
psql (16.2)
56+
Type "help" for help.
57+
58+
ivorysql=# create extension vector;
59+
CREATE EXTENSION
60+
```
61+
Now, pgvector is installed completely.
62+
For more usage cases, please refer to https://github.com/pgvector/pgvector?tab=readme-ov-file#getting-started[pgvector文档]
63+
64+
== Oracle Compatible
65+
In IvorySQL's Oracle compatibility mode, the pgvector extension can also work correctly.
66+
[TIP]
67+
We suggest users to test using port 1521, using the command: psql -p 1521.
68+
69+
=== Data Type
70+
71+
```
72+
ivorysql=# CREATE TABLE items5 (id bigserial PRIMARY KEY, name varchar2(20), num number(20), embedding bit(3));
73+
CREATE TABLE
74+
ivorysql=# INSERT INTO items5 (name, num, embedding) VALUES ('1st oracle data',0, '000'), ('2nd oracle data', 111, '111');
75+
INSERT 0 2
76+
ivorysql=# SELECT * FROM items5 ORDER BY bit_count(embedding # '101') LIMIT 5;
77+
id | name | num | embedding
78+
----+-----------------+-----+-----------
79+
2 | 2nd oracle data | 111 | 111
80+
1 | 1st oracle data | 0 | 000
81+
```
82+
83+
=== Anonymous Block
84+
85+
```
86+
ivorysql=# declare
87+
i vector(3) := '[1,2,3]';
88+
begin
89+
raise notice '%', i;
90+
end;
91+
ivorysql-# /
92+
NOTICE: [1,2,3]
93+
DO
94+
```
95+
96+
=== PROCEDURE
97+
```
98+
ivorysql=# CREATE OR REPLACE PROCEDURE ora_procedure()
99+
AS
100+
p vector(3) := '[4,5,6]';
101+
begin
102+
raise notice '%', p;
103+
end;
104+
/
105+
CREATE PROCEDURE
106+
ivorysql=# call ora_procedure();
107+
NOTICE: [4,5,6]
108+
CALL
109+
```
110+
111+
==== FUNCTION
112+
```
113+
ivorysql=# CREATE OR REPLACE FUNCTION AddVector(a vector(3), b vector(3))
114+
RETURN vector(3)
115+
IS
116+
BEGIN
117+
RETURN a + b;
118+
END;
119+
/
120+
CREATE FUNCTION
121+
ivorysql=# SELECT AddVector('[1,2,3]','[4,5,6]') FROM DUAL;
122+
addvector
123+
----------------
124+
[5,7,9]
125+
(1 row)
126+
```

EN/modules/ROOT/pages/v3.2/2.adoc

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,13 @@ IvorySQL is a powerful open source object-relational database management system
6363

6464
== Compatibility with Oracle
6565

66-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/14[1、Ivorysql frame design]
67-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/15[2、GUC Framework]
68-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/16[3、Case conversion]
69-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/17[4、Dual-mode design]
70-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/18[5、Compatible with Oracle like]
71-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/19[6、Compatible with Oracle anonymous block]
72-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/20[7、Compatible with Oracle functions and stored procedures]
73-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/21[8、Built-in data types and built-in functions]
74-
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/22[9、Added Oracle compatibility mode ports and IP]
66+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/14[1. Ivorysql frame design]
67+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/15[2. GUC Framework]
68+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/16[3. Case conversion]
69+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/17[4. Dual-mode design]
70+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/18[5. Compatible with Oracle like]
71+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/19[6. Compatible with Oracle anonymous block]
72+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/20[7. Compatible with Oracle functions and stored procedures]
73+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/21[8. Built-in data types and built-in functions]
74+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/22[9. Added Oracle compatibility mode ports and IP]
75+
* https://docs.ivorysql.org/cn/ivorysql-doc/v3.2/v3.2/26[10. XML Function]

0 commit comments

Comments
 (0)