Skip to content

Commit b083782

Browse files
billy-the-fishmfreedcevian
authored
Doc updates (#74)
* First updates. * Update about section. * Updates on review. * First updates. * Update about section. * Updates on review. * chore: first draft. * chore: first draft. * chore: first draft. * chore: align about. * chore: align about. * chore: align about. * First updates. * Updates on review. * First updates. * chore: weird rebase stuff. * chore: put license back. * Fix spelling of PostgreSQL Signed-off-by: Mike Freedman <mike@timescale.com> --------- Signed-off-by: Mike Freedman <mike@timescale.com> Co-authored-by: Mike Freedman <mike@timescale.com> Co-authored-by: Matvey Arye <cevian@gmail.com>
1 parent 5eae3d7 commit b083782

File tree

6 files changed

+249
-56
lines changed

6 files changed

+249
-56
lines changed

.github/CODE_OF_CONDUCT.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
You find the Timescale Code of Conduct at <https://www.timescale.com/code-of-conduct>.

CONTRIBUTING.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Contributing to pgvectorscale
2+
3+
We appreciate any help the community can provide to make pgvectorscale better!
4+
5+
You can help in different ways:
6+
7+
* Open an [issue](https://github.com/timescale/pgvectorscale/issues) with a
8+
bug report, build issue, feature request, suggestion, etc.
9+
10+
* Fork this repository and submit a pull request
11+
12+
For any particular improvement you want to make, it can be beneficial to
13+
begin discussion on the GitHub issues page. This is the best place to
14+
discuss your proposed improvement (and its implementation) with the core
15+
development team.
16+
17+
Before we accept any code contributions, pgvectorscale contributors need to
18+
sign the [Contributor License Agreement](https://cla-assistant.io/timescale/pgvectorscale) (CLA). By signing a CLA, we can
19+
ensure that the community is free and confident in its ability to use your
20+
contributions.
21+
22+
## Development
23+
24+
Please follow our DEVELOPMENT doc for [instructions how to develop and test](https://github.com/timescale/pgvectorscale/blob/main/DEVELOPMENT.md).
25+
26+
## Code review workflow
27+
28+
* Sign the [Contributor License Agreement](https://cla-assistant.io/timescale/pgvectorscale) (CLA) if you're a new contributor.
29+
30+
* Develop on your local branch:
31+
32+
* Fork the repository and create a local feature branch to do work on,
33+
ideally on one thing at a time. Don't mix bug fixes with unrelated
34+
feature enhancements or stylistical changes.
35+
36+
* Hack away. Add tests for non-trivial changes.
37+
38+
* Run the [test suite](#testing) and make sure everything passes.
39+
40+
* When committing, be sure to write good commit messages according to [these
41+
seven rules](https://chris.beams.io/posts/git-commit/#seven-rules). Doing
42+
`git commit` prints a message if any of the rules is violated.
43+
Stylistically,
44+
we use commit message titles in the imperative tense, e.g., `Add
45+
merge-append query optimization for time aggregate`. In the case of
46+
non-trivial changes, include a longer description in the commit message
47+
body explaining and detailing the changes. That is, a commit message
48+
should have a short title, followed by a empty line, and then
49+
followed by the longer description.
50+
51+
* When committing, link which GitHub issue of [this
52+
repository](https://github.com/timescale/pgvectorscale/issues) is fixed or
53+
closed by the commit with a [linking keyword recognised by
54+
GitHub](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword).
55+
For example, if the commit fixes bug 123, add a line at the end of the
56+
commit message with `Fixes #123`, if the commit implements feature
57+
request 321, add a line at the end of the commit message `Closes #321`.
58+
This will be recognized by GitHub. It will close the corresponding issue
59+
and place a hyperlink under the number.
60+
61+
* Push your changes to an upstream branch:
62+
63+
* Make sure that each commit in the pull request will represent a
64+
logical change to the code, will compile, and will pass tests.
65+
66+
* Make sure that the pull request message contains all important
67+
information from the commit messages including which issues are
68+
fixed and closed. If a pull request contains one commit only, then
69+
repeating the commit message is preferred, which is done automatically
70+
by GitHub when it creates the pull request.
71+
72+
* Rebase your local feature branch against main (`git fetch origin`,
73+
then `git rebase origin/main`) to make sure you're
74+
submitting your changes on top of the newest version of our code.
75+
76+
* When finalizing your PR (i.e., it has been approved for merging),
77+
aim for the fewest number of commits that
78+
make sense. That is, squash any "fix up" commits into the commit they
79+
fix rather than keep them separate. Each commit should represent a
80+
clean, logical change and include a descriptive commit message.
81+
82+
* Push your commit to your upstream feature branch: `git push -u <yourfork> my-feature-branch`
83+
84+
* Create and manage pull request:
85+
86+
* [Create a pull request using GitHub](https://help.github.com/articles/creating-a-pull-request).
87+
If you know a core developer well suited to reviewing your pull
88+
request, either mention them (preferably by GitHub name) in the PR's
89+
body or [assign them as a reviewer](https://help.github.com/articles/assigning-issues-and-pull-requests-to-other-github-users/).
90+
91+
* Address feedback by amending your commit(s). If your change contains
92+
multiple commits, address each piece of feedback by amending that
93+
commit to which the particular feedback is aimed.
94+
95+
* The PR is marked as accepted when the reviewer thinks it's ready to be
96+
merged. Most new contributors aren't allowed to merge themselves; in
97+
that case, we'll do it for you.
98+
99+
## Testing
100+
101+
Every non-trivial change to the code base should be accompanied by a
102+
relevant addition to or modification of the test suite.
103+
104+
Please check that the full test suite (including your test additions
105+
or changes) passes successfully on your local machine **before you
106+
open a pull request**.
107+
108+
See our [testing](https://github.com/timescale/pgvectorscale/blob/main/DEVELOPMENT.md#testing)
109+
instructions for help with how to test.

DEVELOPMENT.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Setup your pgvectorscale developer environment
2+
3+
You build pgvectorscale from source, then integrate the extension into each database in your PostgreSQL environment.
4+
5+
## pgvectorscale prerequisites
6+
7+
To create a pgvectorscale developer environment, you need the following on your local machine:
8+
9+
* [PostgreSQL v16](https://docs.timescale.com/self-hosted/latest/install/installation-linux/#install-and-configure-timescaledb-on-postgresql)
10+
* [pgvector](https://github.com/pgvector/pgvector/blob/master/README.md)
11+
* Development packages:
12+
```
13+
sudo apt-get install make gcc pkg-config clang postgresql-server-dev-16 libssl-dev
14+
```
15+
16+
* [Rust][rust-language]:
17+
```shell
18+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
19+
```
20+
21+
* [Cargo-pgrx][cargo-pgrx]:
22+
```shell
23+
cargo install --locked cargo-pgrx
24+
```
25+
You must reinstall cargo-pgrx whenever you update Rust, cargo-pgrx must
26+
be built with the same compiler as pgvectorscale.
27+
28+
* The pgrx development environment:
29+
```shell
30+
cargo pgrx init --pg16 pg_config
31+
```
32+
33+
## Build and install pgvectorscale on your database
34+
35+
1. In Terminal, clone this repository and switch to the extension subdirectory:
36+
37+
```shell
38+
git clone https://github.com/timescale/pgvectorscale && \
39+
cd pgvectorscale/pgvectorscale
40+
```
41+
42+
1. Build pgvectorscale:
43+
44+
```shell
45+
cargo pgrx install --release
46+
```
47+
48+
1. Connect to the database:
49+
50+
```bash
51+
psql -d "postgres://<username>@<password>:<port>/<database-name>"
52+
```
53+
54+
1. Add pgvectorscale to your database:
55+
56+
```postgresql
57+
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
58+
```
59+
60+
61+
[pgvector]: https://github.com/pgvector/pgvector/blob/master/README.md
62+
[rust-language]: https://www.rust-lang.org/
63+
[cargo-pgrx]: https://lib.rs/crates/cargo-pgrx

LICENSE

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,4 @@ TIMESCALE SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
1414
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1515
THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND TIMESCALE HAS NO
1616
OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR
17-
MODIFICATIONS.
18-
17+
MODIFICATIONS.

NOTICE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ pgvectorscale by Timescale (TM)
22

33
Copyright (c) 2023-2024 Timescale, Inc. All Rights Reserved.
44

5-
Licensed under the PostgeSQL License (the "License");
5+
Licensed under the PostgreSQL License (the "License");
66
you may not use this file except in compliance with the License.
77
You may obtain a copy of the License at
88

README.md

Lines changed: 74 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,97 @@
1-
# pgvectorscale
21

3-
A vector index for speeding up ANN search in `pgvector`.
2+
<p></p>
3+
<div align=center>
4+
<picture align=center>
5+
<source media="(prefers-color-scheme: dark)" srcset="https://assets.timescale.com/docs/images/timescale-logo-dark-mode.svg">
6+
<source media="(prefers-color-scheme: light)" srcset="https://assets.timescale.com/docs/images/timescale-logo-light-mode.svg">
7+
<img alt="Timescale logo" >
8+
</picture>
49

5-
## 💾 Building and Installing pgvectorscale
10+
<h3>Use pgvectorscale to build scalable AI applications with higher performance,
11+
embedding search and cost-efficient storage. </h3>
612

7-
### From source
13+
[![Docs](https://img.shields.io/badge/Read_the_Timescale_docs-black?style=for-the-badge&logo=readthedocs&logoColor=white)](https://docs.timescale.com/)
14+
[![SLACK](https://img.shields.io/badge/Ask_the_Timescale_community-black?style=for-the-badge&logo=slack&logoColor=white)](https://timescaledb.slack.com/archives/C4GT3N90X)
15+
[![Try Timescale for free](https://img.shields.io/badge/Try_Timescale_for_free-black?style=for-the-badge&logo=timescale&logoColor=white)](https://console.cloud.timescale.com/signup)
16+
</div>
817

9-
#### Prerequisites
1018

11-
Building the extension requires valid rust, along with the postgres headers for whichever version of postgres you are running, and pgrx. We recommend installing rust using the official instructions:
12-
```shell
13-
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
14-
```
15-
16-
You should install the appropriate build tools and postgres headers in the preferred manner for your system. You may also need to install OpenSSL. For Ubuntu you can follow the postgres install instructions then run
19+
pgvectorscale complements [pgvector][pgvector], the open-source vector data extension for PostgreSQL, and introduces the following key innovations:
20+
- A DiskANN index: based on research from Microsoft
21+
- Statistical Binary Quantization: developed by Timescale researchers, This feature improves on standard
22+
Binary Quantization.
1723

18-
```shell
19-
sudo apt-get install make gcc pkg-config clang postgresql-server-dev-16 libssl-dev
20-
```
24+
Timescale’s benchmarks reveal that with pgvectorscale, PostgreSQL achieves **28x lower p95 latency**, and
25+
**16x higher query throughput** for approximate nearest neighbor queries at 99% recall.
2126

22-
Next you need cargo-pgrx, which can be installed with
23-
```shell
24-
cargo install --locked cargo-pgrx
25-
```
27+
<div align=center>
2628

27-
You must reinstall cargo-pgrx whenever you update your Rust compiler, since cargo-pgrx needs to be built with the same compiler as pgvectorscale.
29+
![Benchmarks](https://assets.timescale.com/docs/images/benchmark-comparison-pgvectorscale-pinecone.png)
2830

29-
Finally, setup the pgrx development environment with
30-
```shell
31-
cargo pgrx init --pg16 pg_config
32-
```
31+
PostgreSQL costs are 21% those of Pinecone s1, just saying.
32+
</div>
3333

34-
#### Building and installing the extension
34+
In contrast to pgvector, which is written in C, pgvectorscale is developed in [Rust][rust-language],
35+
offering the PostgreSQL community a new avenue for contributing to vector support.
3536

36-
Download or clone this repository, and switch to the extension subdirectory, e.g.
37-
```shell
38-
git clone https://github.com/timescale/pgvectorscale && \
39-
cd pgvectorscale/pgvectorscale
40-
```
37+
Timescale offers the following high performance journeys:
4138

42-
Then run
43-
```shell
44-
cargo pgrx install --release
45-
```
39+
* **App developer and DBA**: try out pgvectorscale functionality in Timescale Cloud.
40+
* [Enable pgvectorscale in a Timescale service](#enable-pgvectorscale-in-a-timescale-service)
41+
* **Extension contributor**: contribute to pgvectorscale.
42+
* [Build pgvectorscale from source in a developer environment](./DEVELOPMENT.md)
43+
* **Everyone**: check the benchmark results for yourself.
44+
* [Test pgvectorscale performance](#test-pgvectorscale-performance)
4645

47-
To initialize the extension after installation, enter the following into psql:
46+
## Enable pgvectorscale in a Timescale service
4847

49-
```sql
50-
CREATE EXTENSION vectorscale;
51-
```
48+
To enable pgvectorscale:
5249

53-
## ✏️ Get Involved
50+
1. Create a new [Timescale Service](https://console.cloud.timescale.com/dashboard/create_services).
5451

55-
The pgvectorscale project is still in it's early stage as we decide our priorities and what to implement. As such, now is a great time to help shape the project's direction! Have a look at the list of features we're thinking of working on and feel free to comment on the features, expand the list, or hop on the Discussions forum for more in-depth discussions.
52+
If you want to use an existing service, pgvectorscale is added as an available extension on the first maintenance window
53+
after the pgvectorscale release date.
5654

57-
### 🔨 Testing
58-
See above for prerequisites and installation instructions.
55+
1. Connect to your Timescale service:
56+
```bash
57+
psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>"
58+
```
5959

60-
You can run tests against a postgres version pg16 using
61-
```shell
62-
cargo pgrx test ${postgres_version}
63-
```
60+
1. Create the pgvectorscale extension:
6461

65-
To run all tests run:
66-
```shell
67-
cargo test -- --ignored && cargo pgrx test ${postgres_version}
68-
```
62+
```postgresql
63+
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
64+
```
6965
70-
### 🐯 About Timescale
66+
The `CASCADE` automatically installs the dependencies.
7167
72-
TimescaleDB is a distributed time-series database built on PostgreSQL that scales to over 10 million of metrics per second, supports native compression, handles high cardinality, and offers native time-series capabilities, such as data retention policies, continuous aggregate views, downsampling, data gap-filling and interpolation.
68+
## Test pgvectorscale performance
7369
74-
TimescaleDB also supports full SQL, a variety of data types (numerics, text, arrays, JSON, booleans), and ACID semantics. Operationally mature capabilities include high availability, streaming backups, upgrades over time, roles and permissions, and security.
70+
To check the Timescale benchmarks in your pgvectorscale environment:
7571
76-
TimescaleDB has a large and active user community (tens of millions of downloads, hundreds of thousands of active deployments, Slack channels with thousands of members).
72+
1. Jonetas, this is for you :-).
73+
74+
## Get involved
75+
76+
pgvectorscale is still at an early stage. Now is a great time to help shape the
77+
direction of this project; we are currently deciding priorities. Have a look at the
78+
list of features we're thinking of working on. Feel free to comment, expand
79+
the list, or hop on the Discussions forum.
80+
81+
## About Timescale
82+
83+
Timescale Cloud is a high-performance developer focused cloud that provides PostgreSQL services
84+
enhanced with our blazing fast vector search. Timescale services are built using TimescaleDB and
85+
PostgreSQL extensions, like this one. Timescale Cloud provides high availability, streaming
86+
backups, upgrades over time, roles and permissions, and great security.
87+
88+
TimescaleDB is an open-source time-series database designed for scalability and performance,
89+
built on top of PostgreSQL. It provides SQL support for time-series data, allowing users to
90+
leverage PostgreSQL's rich ecosystem while optimizing for high ingest rates and fast query
91+
performance. TimescaleDB includes features like automated data retention policies, compression
92+
and continuous aggregates, making it ideal for applications like monitoring, IoT, AI and
93+
real-time analytics.
94+
95+
96+
[pgvector]: https://github.com/pgvector/pgvector/blob/master/README.md
97+
[rust-language]: https://www.rust-lang.org/

0 commit comments

Comments
 (0)