Skip to content

Commit 56d24de

Browse files
committed
Update documentation with new features.
We have a lot of new features to document. This is a first patch about that, some more work is to be done. That said, it's better than nothing already.
1 parent af2995b commit 56d24de

File tree

8 files changed

+521
-4
lines changed

8 files changed

+521
-4
lines changed

docs/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ Welcome to pgloader's documentation!
2222
ref/mysql
2323
ref/sqlite
2424
ref/mssql
25+
ref/pgsql
26+
ref/pgsql-citus-target
27+
ref/pgsql-redshift-source
28+
ref/pgsql-redshift-target
2529
ref/transforms
2630
bugreport
2731

docs/intro.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,14 @@ pgloader knows how to read data from different kind of sources:
1717
* SQLite
1818
* MySQL
1919
* MS SQL Server
20+
* PostgreSQL
21+
* Redshift
22+
23+
pgloader knows how to target different products using the PostgresQL Protocol:
24+
25+
* PostgreSQL
26+
* `Citus <https://www.citusdata.com>`_
27+
* Redshift
2028

2129
The level of automation provided by pgloader depends on the data source
2230
type. In the case of CSV and Fixed Format files, a full description of the

docs/pgloader.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,18 @@ Those options are meant to tweak `pgloader` behavior when loading data.
154154
machine code) another version of itself, usually a newer one like a very
155155
recent git checkout.
156156

157+
* `--no-ssl-cert-verification`
158+
159+
Uses the OpenSSL option to accept a locally issued server-side
160+
certificate, avoiding the following error message::
161+
162+
SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
163+
164+
The right way to fix the SSL issue is to use a trusted certificate, of
165+
course. Sometimes though it's useful to make progress with the pgloader
166+
setup while the certificate chain of trust is being fixed, maybe by
167+
another team. That's when this option is useful.
168+
157169
Command Line Only Operations
158170
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
159171

@@ -552,6 +564,22 @@ queries from a SQL file. Implements support for PostgreSQL dollar-quoting
552564
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
553565
they are the same thing).
554566

567+
AFTER CREATE SCHEMA DO
568+
^^^^^^^^^^^^^^^^^^^^^^
569+
570+
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
571+
section are executed once the schema has been craeted by pgloader, and
572+
before the data is loaded. It's the right time to ALTER TABLE or do some
573+
custom implementation on-top of what pgloader does, like maybe partitioning.
574+
575+
AFTER CREATE SCHEMA EXECUTE
576+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
577+
578+
Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
579+
the SQL queries from a SQL file. Implements support for PostgreSQL
580+
dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
581+
mode (where they are the same thing).
582+
555583
Connection String
556584
^^^^^^^^^^^^^^^^^
557585

docs/ref/mysql.rst

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
Migrating a MySQL Database to PostgreSQL
22
========================================
33

4-
This command instructs pgloader to load data from a database connection. The
5-
only supported database source is currently *MySQL*, and pgloader supports
6-
dynamically converting the schema of the source database and the indexes
7-
building.
4+
This command instructs pgloader to load data from a database connection.
5+
pgloader supports dynamically converting the schema of the source database
6+
and the indexes building.
87

98
A default set of casting rules are provided and might be overloaded and
109
appended to by the command.
@@ -609,6 +608,14 @@ Date::
609608
to timestamptz drop default
610609
using zero-dates-to-null
611610

611+
type datetime with extra on update current timestamp when not null
612+
to timestamptz drop not null drop default
613+
using zero-dates-to-null
614+
615+
type datetime with extra on update current timestamp
616+
to timestamptz drop default
617+
using zero-dates-to-null
618+
612619
type timestamp when default "0000-00-00 00:00:00" and not null
613620
to timestamptz drop not null drop default
614621
using zero-dates-to-null

docs/ref/pgsql-citus-target.rst

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
Migrating a PostgreSQL Database to Citus
2+
========================================
3+
4+
This command instructs pgloader to load data from a database connection.
5+
Automatic discovery of the schema is supported, including build of the
6+
indexes, primary and foreign keys constraints. A default set of casting
7+
rules are provided and might be overloaded and appended to by the command.
8+
9+
Automatic distribution column backfilling is supported, either from commands
10+
that specify what is the distribution column in every table, or only in the
11+
main table, then relying on foreign key constraints to discover the other
12+
distribution keys.
13+
14+
Here's a short example of migrating a database from a PostgreSQL server to
15+
another:
16+
17+
::
18+
19+
load database
20+
from pgsql:///hackathon
21+
into pgsql://localhost:9700/dim
22+
23+
with include drop, reset no sequences
24+
25+
cast column impressions.seen_at to "timestamp with time zone"
26+
27+
distribute companies using id
28+
-- distribute campaigns using company_id
29+
-- distribute ads using company_id from campaigns
30+
-- distribute clicks using company_id from ads, campaigns
31+
-- distribute impressions using company_id from ads, campaigns
32+
;
33+
34+
Everything works exactly the same way as when doing a PostgreSQL to
35+
PostgreSQL migration, with the added fonctionality of this new `distribute`
36+
command.
37+
38+
Distribute Command
39+
^^^^^^^^^^^^^^^^^^
40+
41+
The distribute command syntax is as following::
42+
43+
distribute <table name> using <column name>
44+
distribute <table name> using <column name> from <table> [, <table>, ...]
45+
distribute <table name> as reference table
46+
47+
When using the distribute command, the following steps are added to pgloader
48+
operations when migrating the schema:
49+
50+
- if the distribution column does not exist in the table, it is added as
51+
the first column of the table
52+
53+
- if the distribution column does not exists in the primary key of the
54+
table, it is added as the first column of the primary of the table
55+
56+
- all the foreign keys that point to the table are added the distribution
57+
key automatically too, including the source tables of the foreign key
58+
constraints
59+
60+
- once the schema has been created on the target database, pgloader then
61+
issues Citus specific command `create_reference_table()
62+
<http://docs.citusdata.com/en/v8.0/develop/api_udf.html?highlight=create_reference_table#create-reference-table>`_
63+
and `create_distributed_table()
64+
<http://docs.citusdata.com/en/v8.0/develop/api_udf.html?highlight=create_reference_table#create-distributed-table>`_
65+
to make the tables distributed
66+
67+
Those operations are done in the schema section of pgloader, before the data
68+
is loaded. When the data is loaded, the newly added columns need to be
69+
backfilled from referenced data. pgloader knows how to do that by generating
70+
a query like the following and importing the result set of such a query
71+
rather than the raw data from the source table.
72+
73+
Citus Migration: Limitations
74+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
75+
76+
The way pgloader implements *reset sequence* does not work with Citus at
77+
this point, so sequences need to be taken care of separately at this point.

docs/ref/pgsql-redshift-source.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Migrating a Redhift Database to PostgreSQL
2+
==========================================
3+
4+
This command instructs pgloader to load data from a database connection.
5+
Automatic discovery of the schema is supported, including build of the
6+
indexes, primary and foreign keys constraints. A default set of casting
7+
rules are provided and might be overloaded and appended to by the command.
8+
9+
The command and behavior are the same as when migration from a PostgreSQL
10+
database source. pgloader automatically discovers that it's talking to a
11+
Redshift database by parsing the output of the `SELECT version()` SQL query.
12+

docs/ref/pgsql-redshift-target.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Migrating a PostgreSQL Database to Redshift
2+
===========================================
3+
4+
This command instructs pgloader to load data from a database connection.
5+
Automatic discovery of the schema is supported, including build of the
6+
indexes, primary and foreign keys constraints. A default set of casting
7+
rules are provided and might be overloaded and appended to by the command.
8+
9+
10+
TODO: add details about S3 credentials and bucket configuration.

0 commit comments

Comments
 (0)