|
| 1 | +Support for Redshift in pgloader |
| 2 | +================================ |
| 3 | + |
| 4 | +The command and behavior are the same as when migration from a PostgreSQL |
| 5 | +database source. pgloader automatically discovers that it's talking to a |
| 6 | +Redshift database by parsing the output of the `SELECT version()` SQL query. |
| 7 | + |
| 8 | +Redhift as a data source |
| 9 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 10 | + |
| 11 | +Redshit is a variant of PostgreSQL version 8.0.2, which allows pgloader to |
| 12 | +work with only a very small amount of adaptation in the catalog queries |
| 13 | +used. In other words, migrating from Redshift to PostgreSQL works just the |
| 14 | +same as when migrating from a PostgreSQL data source, including the |
| 15 | +connection string specification. |
| 16 | + |
| 17 | +Redshift as a data destination |
| 18 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 19 | + |
| 20 | +The Redshift variant of PostgreSQL 8.0.2 does not have support for the |
| 21 | +``COPY FROM STDIN`` feature that pgloader normally relies upon. To use COPY |
| 22 | +with Redshift, the data must first be made available in an S3 bucket. |
| 23 | + |
| 24 | +First, pgloader must authenticate to Amazon S3. pgloader uses the following |
| 25 | +setup for that: |
| 26 | + |
| 27 | + - ``~/.aws/config`` |
| 28 | + |
| 29 | + This INI formatted file contains sections with your default region and |
| 30 | + other global values relevant to using the S3 API. pgloader parses it to |
| 31 | + get the region when it's setup in the ``default`` INI section. |
| 32 | + |
| 33 | + The environment variable ``AWS_DEFAULT_REGION`` can be used to override |
| 34 | + the configuration file value. |
| 35 | + |
| 36 | + - ``~/.aws/credentials`` |
| 37 | + |
| 38 | + The INI formatted file contains your authentication setup to Amazon, |
| 39 | + with the properties ``aws_access_key_id`` and ``aws_secret_access_key`` |
| 40 | + in the section ``default``. pgloader parses this file for those keys, |
| 41 | + and uses their values when communicating with Amazon S3. |
| 42 | + |
| 43 | + The environment variables ``AWS_ACCESS_KEY_ID`` and |
| 44 | + ``AWS_SECRET_ACCESS_KEY`` can be used to override the configuration file |
| 45 | + |
| 46 | + - ``AWS_S3_BUCKET_NAME`` |
| 47 | + |
| 48 | + Finally, the value of the environment variable ``AWS_S3_BUCKET_NAME`` is |
| 49 | + used by pgloader as the name of the S3 bucket where to upload the files |
| 50 | + to COPY to the Redshift database. The bucket name defaults to |
| 51 | + ``pgloader``. |
| 52 | + |
| 53 | +Then pgloader works as usual, see the other sections of the documentation |
| 54 | +for the details, depending on the data source (files, other databases, etc). |
| 55 | +When preparing the data for PostgreSQL, pgloader now uploads each batch into |
| 56 | +a single CSV file, and then issue such as the following, for each batch: |
| 57 | + |
| 58 | +:: |
| 59 | + |
| 60 | + COPY <target_table_name> |
| 61 | + FROM 's3://<s3 bucket>/<s3-filename-just-uploaded>' |
| 62 | + FORMAT CSV |
| 63 | + TIMEFORMAT 'auto' |
| 64 | + REGION '<aws-region>' |
| 65 | + ACCESS_KEY_ID '<aws-access-key-id>' |
| 66 | + SECRET_ACCESS_KEY '<aws-secret-access-key>; |
| 67 | + |
| 68 | +This is the only difference with a PostgreSQL core version, where pgloader |
| 69 | +can rely on the classic ``COPY FROM STDIN`` command, which allows to send |
| 70 | +data through the already established connection to PostgreSQL. |
0 commit comments