@@ -70,6 +70,125 @@ backfilled from referenced data. pgloader knows how to do that by generating
70
70
a query like the following and importing the result set of such a query
71
71
rather than the raw data from the source table.
72
72
73
+ Citus Migration Example
74
+ ^^^^^^^^^^^^^^^^^^^^^^^
75
+
76
+ With the migration command as above, pgloader adds the column ``company_id ``
77
+ to the tables that have a direct or indirect foreign key reference to the
78
+ ``companies `` table.
79
+
80
+ We run pgloader using the following command, where the file
81
+ `./test/citus/company.load
82
+ <https://github.com/dimitri/pgloader/blob/master/test/citus/company.load> `_
83
+ contains the pgloader command as shown above.
84
+
85
+ ::
86
+
87
+ $ pgloader --client-min-messages sql ./test/citus/company.load
88
+
89
+ The following SQL statements are all extracted from the log messages that
90
+ the pgloader command outputs. We are going to have a look at the
91
+ `impressions ` table. It gets created with a new column `company_id ` in the
92
+ first position, as follows:
93
+
94
+ ::
95
+
96
+ CREATE TABLE "public"."impressions"
97
+ (
98
+ company_id bigint,
99
+ "id" bigserial,
100
+ "ad_id" bigint default NULL,
101
+ "seen_at" timestamp with time zone default NULL,
102
+ "site_url" text default NULL,
103
+ "cost_per_impression_usd" numeric(20,10) default NULL,
104
+ "user_ip" inet default NULL,
105
+ "user_data" jsonb default NULL
106
+ );
107
+
108
+ The original schema for this table does not have the `company_id ` column,
109
+ which means pgloader now needs to change the primary key definition, the
110
+ foreign keys constraints definitions from and to this table, and also to
111
+ *backfill * the `company_id ` data to this table when doing the COPY phase of
112
+ the migration.
113
+
114
+ Then once the tables have been created, pgloader executes the following SQL
115
+ statements::
116
+
117
+ SELECT create_distributed_table('"public"."companies"', 'id');
118
+ SELECT create_distributed_table('"public"."campaigns"', 'company_id');
119
+ SELECT create_distributed_table('"public"."ads"', 'company_id');
120
+ SELECT create_distributed_table('"public"."clicks"', 'company_id');
121
+ SELECT create_distributed_table('"public"."impressions"', 'company_id');
122
+
123
+ Then when copying the data from the source PostgreSQL database to the new
124
+ Citus tables, the new column (here ``company_id ``) needs to be backfilled
125
+ from the source tables. Here's the SQL query that pgloader uses as a data
126
+ source for the ``ads `` table in our example:
127
+
128
+ ::
129
+
130
+ SELECT "campaigns".company_id::text, "ads".id::text, "ads".campaign_id::text,
131
+ "ads".name::text, "ads".image_url::text, "ads".target_url::text,
132
+ "ads".impressions_count::text, "ads".clicks_count::text,
133
+ "ads".created_at::text, "ads".updated_at::text
134
+
135
+ FROM "public"."ads"
136
+ JOIN "public"."campaigns"
137
+ ON ads.campaign_id = campaigns.id
138
+
139
+ The ``impressions `` table has an indirect foreign key reference to the
140
+ ``company `` table, which is the table where the distribution key is
141
+ specified. pgloader will discover that itself from walking the PostgreSQL
142
+ catalogs, and you may also use the following specification in the pgloader
143
+ command to explicitely add the indirect dependency:
144
+
145
+ ::
146
+
147
+ distribute impressions using company_id from ads, campaigns
148
+
149
+ Given this schema, the SQL query used by pgloader to fetch the data for the
150
+ `impressions ` table is the following, implementing online backfilling of the
151
+ data:
152
+
153
+ ::
154
+
155
+ SELECT "campaigns".company_id::text, "impressions".id::text,
156
+ "impressions".ad_id::text, "impressions".seen_at::text,
157
+ "impressions".site_url::text,
158
+ "impressions".cost_per_impression_usd::text,
159
+ "impressions".user_ip::text,
160
+ "impressions".user_data::text
161
+
162
+ FROM "public"."impressions"
163
+
164
+ JOIN "public"."ads"
165
+ ON impressions.ad_id = ads.id
166
+
167
+ JOIN "public"."campaigns"
168
+ ON ads.campaign_id = campaigns.id
169
+
170
+ When the data copying is done, then pgloader also has to install the indexes
171
+ supporting the primary keys, and add the foreign key definitions to the
172
+ schema. Those definitions are not the same as in the source schema, because
173
+ of the adding of the distribution column to the table: we need to also add
174
+ the column to the primary key and the foreign key constraints.
175
+
176
+ Here's the commands issued by pgloader for the ``impressions `` table:
177
+
178
+ ::
179
+
180
+ CREATE UNIQUE INDEX "impressions_pkey"
181
+ ON "public"."impressions" (company_id, id);
182
+
183
+ ALTER TABLE "public"."impressions"
184
+ ADD CONSTRAINT "impressions_ad_id_fkey"
185
+ FOREIGN KEY(company_id,ad_id)
186
+ REFERENCES "public"."ads"(company_id,id)
187
+
188
+ Given a single line of specification ``distribute companies using id `` then
189
+ pgloader implements all the necessary schema changes on the fly when
190
+ migrating to Citus, and also dynamically backfills the data.
191
+
73
192
Citus Migration: Limitations
74
193
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
75
194
0 commit comments