Skip to content

Commit 0070036

Browse files
committed
Improve Redshift support documentation.
1 parent f72afee commit 0070036

File tree

5 files changed

+74
-24
lines changed

5 files changed

+74
-24
lines changed

docs/index.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ Welcome to pgloader's documentation!
2424
ref/mssql
2525
ref/pgsql
2626
ref/pgsql-citus-target
27-
ref/pgsql-redshift-source
28-
ref/pgsql-redshift-target
27+
ref/pgsql-redshift
2928
ref/transforms
3029
bugreport
3130

docs/intro.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,13 @@ the data into the server, and manages errors by filling a pair of
1010
pgloader knows how to read data from different kind of sources:
1111

1212
* Files
13+
1314
* CSV
1415
* Fixed Format
1516
* DBF
17+
1618
* Databases
19+
1720
* SQLite
1821
* MySQL
1922
* MS SQL Server

docs/ref/pgsql-redshift-source.rst

Lines changed: 0 additions & 12 deletions
This file was deleted.

docs/ref/pgsql-redshift-target.rst

Lines changed: 0 additions & 10 deletions
This file was deleted.

docs/ref/pgsql-redshift.rst

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
Support for Redshift in pgloader
2+
================================
3+
4+
The command and behavior are the same as when migration from a PostgreSQL
5+
database source. pgloader automatically discovers that it's talking to a
6+
Redshift database by parsing the output of the `SELECT version()` SQL query.
7+
8+
Redhift as a data source
9+
^^^^^^^^^^^^^^^^^^^^^^^^
10+
11+
Redshit is a variant of PostgreSQL version 8.0.2, which allows pgloader to
12+
work with only a very small amount of adaptation in the catalog queries
13+
used. In other words, migrating from Redshift to PostgreSQL works just the
14+
same as when migrating from a PostgreSQL data source, including the
15+
connection string specification.
16+
17+
Redshift as a data destination
18+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19+
20+
The Redshift variant of PostgreSQL 8.0.2 does not have support for the
21+
``COPY FROM STDIN`` feature that pgloader normally relies upon. To use COPY
22+
with Redshift, the data must first be made available in an S3 bucket.
23+
24+
First, pgloader must authenticate to Amazon S3. pgloader uses the following
25+
setup for that:
26+
27+
- ``~/.aws/config``
28+
29+
This INI formatted file contains sections with your default region and
30+
other global values relevant to using the S3 API. pgloader parses it to
31+
get the region when it's setup in the ``default`` INI section.
32+
33+
The environment variable ``AWS_DEFAULT_REGION`` can be used to override
34+
the configuration file value.
35+
36+
- ``~/.aws/credentials``
37+
38+
The INI formatted file contains your authentication setup to Amazon,
39+
with the properties ``aws_access_key_id`` and ``aws_secret_access_key``
40+
in the section ``default``. pgloader parses this file for those keys,
41+
and uses their values when communicating with Amazon S3.
42+
43+
The environment variables ``AWS_ACCESS_KEY_ID`` and
44+
``AWS_SECRET_ACCESS_KEY`` can be used to override the configuration file
45+
46+
- ``AWS_S3_BUCKET_NAME``
47+
48+
Finally, the value of the environment variable ``AWS_S3_BUCKET_NAME`` is
49+
used by pgloader as the name of the S3 bucket where to upload the files
50+
to COPY to the Redshift database. The bucket name defaults to
51+
``pgloader``.
52+
53+
Then pgloader works as usual, see the other sections of the documentation
54+
for the details, depending on the data source (files, other databases, etc).
55+
When preparing the data for PostgreSQL, pgloader now uploads each batch into
56+
a single CSV file, and then issue such as the following, for each batch:
57+
58+
::
59+
60+
COPY <target_table_name>
61+
FROM 's3://<s3 bucket>/<s3-filename-just-uploaded>'
62+
FORMAT CSV
63+
TIMEFORMAT 'auto'
64+
REGION '<aws-region>'
65+
ACCESS_KEY_ID '<aws-access-key-id>'
66+
SECRET_ACCESS_KEY '<aws-secret-access-key>;
67+
68+
This is the only difference with a PostgreSQL core version, where pgloader
69+
can rely on the classic ``COPY FROM STDIN`` command, which allows to send
70+
data through the already established connection to PostgreSQL.

0 commit comments

Comments
 (0)