Skip to content

Commit a556ec1

Browse files
authored
More query router commands; settings last until changed again; docs (#25)
* readme * touch up docs * stuff * refactor query router * remove unused * less verbose * docs * no link * method rename
1 parent bbacb9c commit a556ec1

File tree

5 files changed

+567
-334
lines changed

5 files changed

+567
-334
lines changed

README.md

Lines changed: 118 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,26 @@ Meow. PgBouncer rewritten in Rust, with sharding, load balancing and failover su
88

99
**Alpha**: don't use in production just yet.
1010

11+
## Features
12+
13+
| **Feature** | **Status** | **Comments** |
14+
|--------------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
15+
| Transaction pooling | :heavy_check_mark: | Identical to PgBouncer. |
16+
| Session pooling | :heavy_check_mark: | Identical to PgBouncer. |
17+
| `COPY` support | :heavy_check_mark: | Both `COPY TO` and `COPY FROM` are supported. |
18+
| Query cancellation | :heavy_check_mark: | Supported both in transaction and session pooling modes. |
19+
| Load balancing of read queries | :heavy_check_mark: | Using round-robin between replicas. Primary is included when `primary_reads_enabled` is enabled (default). |
20+
| Sharding | :heavy_check_mark: | Transactions are sharded using `SET SHARD TO` and `SET SHARDING KEY TO` syntax extensions; see examples below. |
21+
| Failover | :heavy_check_mark: | Replicas are tested with a health check. If a health check fails, remaining replicas are attempted; see below for algorithm description and examples. |
22+
| Statistics reporting | :heavy_check_mark: | Statistics similar to PgBouncers are reported via StatsD. |
23+
| Live configuration reloading | :x: :wrench: | On the roadmap; currently config changes require restart. |
24+
| Client authentication | :x: :wrench: | On the roadmap; currently all clients are allowed to connect and one user is used to connect to Postgres. |
25+
26+
## Deployment
27+
28+
See `Dockerfile` for example deployment using Docker. The pooler is configured to spawn 4 workers so 4 CPUs are recommended for optimal performance.
29+
That setting can be adjusted to spawn as many (or as little) workers as needed.
30+
1131
## Local development
1232

1333
1. Install Rust (latest stable will work great).
@@ -18,7 +38,7 @@ Meow. PgBouncer rewritten in Rust, with sharding, load balancing and failover su
1838

1939
### Tests
2040

21-
You can just PgBench to test your changes:
41+
Quickest way to test your changes is to use pgbench:
2242

2343
```
2444
pgbench -i -h 127.0.0.1 -p 6432 && \
@@ -28,80 +48,130 @@ pgbench -t 1000 -p 6432 -h 127.0.0.1 --protocol extended
2848

2949
See [sharding README](./tests/sharding/README.md) for sharding logic testing.
3050

31-
## Features
51+
| **Feature** | **Tested in CI** | **Tested manually** | **Comments** |
52+
|----------------------|--------------------|---------------------|--------------------------------------------------------------------------------------------------------------------------|
53+
| Transaction pooling | :heavy_check_mark: | :heavy_check_mark: | Used by default for all tests. |
54+
| Session pooling | :x: | :heavy_check_mark: | Easiest way to test is to enable it and run pgbench - results will be better than transaction pooling as expected. |
55+
| `COPY` | :heavy_check_mark: | :heavy_check_mark: | `pgbench -i` uses `COPY`. `COPY FROM` is tested as well. |
56+
| Query cancellation | :heavy_check_mark: | :heavy_check_mark: | `psql -c 'SELECT pg_sleep(1000);'` and press `Ctrl-C`. |
57+
| Load balancing | :x: | :heavy_check_mark: | We could test this by emitting statistics for each replica and compare them. |
58+
| Failover | :x: | :heavy_check_mark: | Misconfigure a replica in `pgcat.toml` and watch it forward queries to spares. CI testing could include using Toxiproxy. |
59+
| Sharding | :heavy_check_mark: | :heavy_check_mark: | See `tests/sharding` and `tests/ruby` for an Rails/ActiveRecord example. |
60+
| Statistics reporting | :x: | :heavy_check_mark: | Run `nc -l -u 8125` and watch the stats come in every 15 seconds. |
61+
3262

33-
1. Session mode.
34-
2. Transaction mode.
35-
3. `COPY` protocol support.
36-
4. Query cancellation.
37-
5. Round-robin load balancing of replicas.
38-
6. Banlist & failover.
39-
7. Sharding!
40-
8. Explicit query routing to primary or replicas.
63+
## Usage
4164

4265
### Session mode
43-
Each client owns its own server for the duration of the session. Commands like `SET` are allowed.
44-
This is identical to PgBouncer session mode.
66+
In session mode, a client talks to one server for the duration of the connection. Prepared statements, `SET`, and advisory locks are supported. In terms of supported features, there is very little if any difference between session mode and talking directly to the server.
67+
68+
To use session mode, change `pool_mode = "session"`.
4569

4670
### Transaction mode
47-
The connection is attached to the server for the duration of the transaction. `SET` will pollute the connection,
48-
but `SET LOCAL` works great. Identical to PgBouncer transaction mode.
71+
In transaction mode, a client talks to one server for the duration of a single transaction; once it's over, the server is returned to the pool. Prepared statements, `SET`, and advisory locks are not supported; alternatives are to use `SET LOCAL` and `pg_advisory_xact_lock` which are scoped to the transaction.
4972

50-
### COPY protocol
51-
That one isn't particularly special, but good to mention that you can `COPY` data in and from the server
52-
using this pooler.
73+
This mode is enabled by default.
5374

54-
### Query cancellation
55-
Okay, this is just basic stuff, but we support cancelling queries. If you know the Postgres protocol,
56-
this might be relevant given than this is a transactional pooler but if you're new to Pg, don't worry about it, it works.
75+
### Load balancing of read queries
76+
All queries are load balanced against the configured servers using the round-robin algorithm. The most straight forward configuration example would be to put this pooler in front of several replicas and let it load balance all queries.
5777

58-
### Round-robin load balancing
59-
This is the novel part. PgBouncer doesn't support it and suggests we use DNS or a TCP proxy instead.
60-
We prefer to have everything as part of one package; arguably, it's easier to understand and optimize.
61-
This pooler will round-robin between multiple replicas keeping load reasonably even. If the primary is in
62-
the pool as well, it'll be treated as a replica for read-only queries.
78+
If the configuration includes a primary and replicas, the queries can be separated with the built-in query parser. The query parser will interpret the query and route all `SELECT` queries to a replica, while all other queries including explicit transactions will be routed to the primary.
6379

64-
### Banlist & failover
65-
This is where it gets even more interesting. If we fail to connect to one of the replicas or it fails a health check,
66-
we add it to a ban list. No more new transactions will be served by that replica for, in our case, 60 seconds. This
67-
gives it the opportunity to recover while clients are happily served by the remaining replicas.
80+
The query parser is disabled by default.
6881

69-
This decreases error rates substantially! Worth noting here that on busy systems, if the replicas are running too hot,
70-
failing over could bring even more load and tip over the remaining healthy-ish replicas. In this case, a decision should be made:
71-
either lose 1/x of your traffic or risk losing it all eventually. Ideally you overprovision your system, so you don't necessarily need
72-
to make this choice :-).
82+
#### Query parser
83+
The query parser will do its best to determine where the query should go, but sometimes that's not possible. In that case, the client can select which server it wants using this custom SQL syntax:
7384

74-
### Sharding
75-
We're implemeting Postgres' `PARTITION BY HASH` sharding function for `BIGINT` fields. This works well for tables that use `BIGSERIAL` primary key which I think is common enough these days. We can also add many more functions here, but this is a good start. See `src/sharding.rs` and `tests/sharding/partition_hash_test_setup.sql` for more details on the implementation.
85+
```sql
86+
-- To talk to the primary for the duration of the next transaction:
87+
SET SERVER ROLE TO 'primary';
88+
89+
-- To talk to the replica for the duration of the next transaction:
90+
SET SERVER ROLE TO 'replica';
91+
92+
-- Let the query parser decide
93+
SET SERVER ROLE TO 'auto';
94+
95+
-- Pick any server at random
96+
SET SERVER ROLE TO 'any';
97+
98+
-- Reset to default configured settings
99+
SET SERVER ROLE TO 'default';
100+
```
76101

77-
The biggest advantage of using this sharding function is that anyone can shard the dataset using Postgres partitions
78-
while also access it for both reads and writes using this pooler. No custom obscure sharding function is needed and database sharding can be done entirely in Postgres.
102+
The setting will persist until it's changed again or the client disconnects.
79103

80-
To select the shard we want to talk to, we introduced special syntax:
104+
By default, all queries are routed to all servers; `default_role` setting controls this behavior.
105+
106+
### Failover
107+
All servers are checked with a `SELECT 1` query before being given to a client. If the server is not reachable, it will be banned and cannot serve any more transactions for the duration of the ban. The queries are routed to the remaining servers. If all servers become banned, the ban list is cleared: this is a safety precaution against false positives. The primary can never be banned.
108+
109+
The ban time can be changed with `ban_time`. The default is 60 seconds.
110+
111+
### Sharding
112+
We use the `PARTITION BY HASH` hashing function, the same as used by Postgres for declarative partitioning. This allows to shard the database using Postgres partitions and place the partitions on different servers (shards). Both read and write queries can be routed to the shards using this pooler.
113+
114+
To route queries to a particular shard, we use this custom SQL syntax:
81115

82116
```sql
117+
-- To talk to a shard explicitely
118+
SET SHARD TO '1';
119+
120+
-- To let the pooler choose based on a value
83121
SET SHARDING KEY TO '1234';
84122
```
85123

86-
This sharding key will be hashed and the pooler will select a shard to use for the next transaction. If the pooler is in session mode, this sharding key has to be set as the first query on startup & cannot be changed until the client re-connects.
124+
The active shard will last until it's changed again or the client disconnects. By default, the queries are routed to shard 0.
87125

88-
### Explicit read/write query routing
126+
For hash function implementation, see `src/sharding.rs` and `tests/sharding/partition_hash_test_setup.sql`.
89127

90-
If you want to have the primary and replicas in the same pooler, you'd probably want to
91-
route queries explicitely to the primary or replicas, depending if they are reads or writes (e.g `SELECT`s or `INSERT`/`UPDATE`, etc). To help with this, we introduce some more custom syntax:
128+
#### ActiveRecord/Rails
92129

93-
```sql
94-
SET SERVER ROLE TO 'primary';
95-
SET SERVER ROLE TO 'replica';
130+
```ruby
131+
class User < ActiveRecord::Base
132+
end
133+
134+
# Metadata will be fetched from shard 0
135+
ActiveRecord::Base.establish_connection
136+
137+
# Grab a bunch of users from shard 1
138+
User.connection.execute "SET SHARD TO '1'"
139+
User.take(10)
140+
141+
# Using id as the sharding key
142+
User.connection.execute "SET SHARDING KEY TO '1234'"
143+
User.find_by_id(1234)
144+
145+
# Using geographical sharding
146+
User.connection.execute "SET SERVER ROLE TO 'primary'"
147+
User.connection.execute "SET SHARDING KEY TO '85'"
148+
User.create(name: "test user", email: "test@example.com", zone_id: 85)
149+
150+
# Let the query parser figure out where the query should go.
151+
# We are still on shard = hash(85) % shards.
152+
User.connection.execute "SET SERVER ROLE TO 'auto'"
153+
User.find_by_email("test@example.com")
96154
```
97155

98-
After executing this, the next transaction will be routed to the primary or replica respectively. By default, all queries will be load-balanced between all servers, so if the client wants to write or talk to the primary, they have to explicitely select it using the syntax above.
156+
#### Raw SQL
99157

158+
```sql
159+
-- Grab a bunch of users from shard 1
160+
SET SHARD TO '1';
161+
SELECT * FROM users LIMT 10;
100162

163+
-- Find by id
164+
SET SHARDING KEY TO '1234';
165+
SELECT * FROM USERS WHERE id = 1234;
101166

102-
## Missing
167+
-- Writing in a primary/replicas configuration.
168+
SET SHARDING ROLE TO 'primary';
169+
SET SHARDING KEY TO '85';
170+
INSERT INTO users (name, email, zome_id) VALUES ('test user', 'test@example.com', 85);
103171

104-
1. Authentication, ehem, this proxy is letting anyone in at the moment.
172+
SET SERVER ROLE TO 'auto'; -- let the query router figure out where the query should go
173+
SELECT * FROM users WHERE email = 'test@example.com'; -- shard setting lasts until set again; we are reading from the primary
174+
```
105175

106176
## Benchmarks
107177

src/client.rs

Lines changed: 40 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ use crate::constants::*;
1414
use crate::errors::Error;
1515
use crate::messages::*;
1616
use crate::pool::{ClientServerMap, ConnectionPool};
17-
use crate::query_router::QueryRouter;
17+
use crate::query_router::{Command, QueryRouter};
1818
use crate::server::Server;
1919
use crate::stats::Reporter;
2020

@@ -198,37 +198,57 @@ impl Client {
198198
// SET SHARDING KEY TO 'bigint';
199199
let mut message = read_message(&mut self.read).await?;
200200

201-
// Parse for special select shard command.
202-
// SET SHARDING KEY TO 'bigint';
203-
if query_router.select_shard(message.clone()) {
204-
custom_protocol_response_ok(
201+
// Handle all custom protocol commands here.
202+
match query_router.try_execute_command(message.clone()) {
203+
// Normal query
204+
None => {
205+
if query_router.query_parser_enabled() && query_router.role() == None {
206+
query_router.infer_role(message.clone());
207+
}
208+
}
209+
210+
Some((Command::SetShard, _)) | Some((Command::SetShardingKey, _)) => {
211+
custom_protocol_response_ok(&mut self.write, &format!("SET SHARD")).await?;
212+
continue;
213+
}
214+
215+
Some((Command::SetServerRole, _)) => {
216+
custom_protocol_response_ok(&mut self.write, "SET SERVER ROLE").await?;
217+
continue;
218+
}
219+
220+
Some((Command::ShowServerRole, value)) => {
221+
show_response(&mut self.write, "server role", &value).await?;
222+
continue;
223+
}
224+
225+
Some((Command::ShowShard, value)) => {
226+
show_response(&mut self.write, "shard", &value).await?;
227+
continue;
228+
}
229+
};
230+
231+
// Make sure we selected a valid shard.
232+
if query_router.shard() >= pool.shards() {
233+
error_response(
205234
&mut self.write,
206-
&format!("SET SHARD TO {}", query_router.shard()),
235+
&format!(
236+
"shard '{}' is more than configured '{}'",
237+
query_router.shard(),
238+
pool.shards()
239+
),
207240
)
208241
.await?;
209242
continue;
210243
}
211244

212-
// Parse for special server role selection command.
213-
// SET SERVER ROLE TO '(primary|replica)';
214-
if query_router.select_role(message.clone()) {
215-
custom_protocol_response_ok(&mut self.write, "SET SERVER ROLE").await?;
216-
continue;
217-
}
218-
219-
// Attempt to parse the query to determine where it should go
220-
if query_router.query_parser_enabled() && query_router.role() == None {
221-
query_router.infer_role(message.clone());
222-
}
223-
224245
// Grab a server from the pool: the client issued a regular query.
225246
let connection = match pool.get(query_router.shard(), query_router.role()).await {
226247
Ok(conn) => conn,
227248
Err(err) => {
228249
println!(">> Could not get connection from pool: {:?}", err);
229250
error_response(&mut self.write, "could not get connection from the pool")
230251
.await?;
231-
query_router.reset();
232252
continue;
233253
}
234254
};
@@ -310,9 +330,6 @@ impl Client {
310330
if self.transaction_mode {
311331
// Report this client as idle.
312332
self.stats.client_idle();
313-
314-
query_router.reset();
315-
316333
break;
317334
}
318335
}
@@ -395,9 +412,6 @@ impl Client {
395412

396413
if self.transaction_mode {
397414
self.stats.client_idle();
398-
399-
query_router.reset();
400-
401415
break;
402416
}
403417
}
@@ -431,8 +445,7 @@ impl Client {
431445
self.stats.transaction();
432446

433447
if self.transaction_mode {
434-
query_router.reset();
435-
448+
self.stats.client_idle();
436449
break;
437450
}
438451
}

0 commit comments

Comments
 (0)