Skip to content

Commit e895637

Browse files
Merge pull request #330 from FirebasePrivate/rachelsaunders-importscript-bigquery
BigQuery- add import script install/run instructions
2 parents 26dd086 + 17d0567 commit e895637

File tree

1 file changed

+49
-2
lines changed

1 file changed

+49
-2
lines changed
Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,50 @@
1-
This is a place holder.
1+
### Overview
22

3-
Its content is developed in go/firestore-bigquery-export-docs
3+
The import script (`fs-bq-import-collection`) can read all existing documents in a Cloud Firestore collection and insert them into the raw changelog table created by the Export Collections to BigQuery extension. The script adds a special changelog for each document with the operation of `IMPORT` and the timestamp of epoch. This ensures that any operation on an imported document supersedes the import record.
4+
5+
You may pause and resume the script from the last batch at any point.
6+
7+
#### Important notes
8+
9+
+ Run the script over the entire collection **_after_** installing the Export Collections to BigQuery extension; otherwise the writes to your database during the import might not be exported to the dataset.
10+
+ The import script can take up to _O(collection size)_ time to finish. If your collection is large, you might want to consider [loading data from a Cloud Firestore export into BigQuery](https://cloud.google.com/bigquery/docs/loading-data-cloud-firestore).
11+
+ You will see redundant rows in your raw changelog table:
12+
13+
+ If document changes occur in the time between installing the extension and running this import script.
14+
+ If you run the import script multiple times over the same collection.
15+
16+
### Install and run the script
17+
18+
This import script uses several values from your installation of the extension:
19+
20+
+ `${PROJECT_ID}`: the project ID for the Firebase project in which you installed the extension
21+
+ `${COLLECTION_PATH}`: the collection path that you specified during extension installation
22+
+ `${DATASET_ID}`: the ID that you specified for your dataset during extension installation
23+
24+
1. Run `npx @firebaseextensions/fs-bq-import-collection`.
25+
26+
1. When prompted, enter the Cloud Firestore collection path that you specified during extension installation, `${COLLECTION_PATH}`.
27+
28+
1. _(Optional)_ You can pause and resume the import at any time:
29+
30+
+ **Pause the import:** enter `CTRL+C`
31+
The import script records the name of the last successfully imported document in a cursor file called:
32+
`from-${COLLECTION_PATH}-to-${PROJECT_ID}:${DATASET_ID}:${rawChangeLogName}`,
33+
which lives in the directory from which you invoked the import script.
34+
35+
+ **Resume the import from where you left off:** re-run `npx @firebaseextensions/fs-bq-import-collection`
36+
_from the same directory that you previously invoked the script_
37+
38+
Note that when an import completes successfully, the import script automatically cleans up the cursor file it was using to keep track of its progress.
39+
40+
1. In the [BigQuery web UI](https://console.cloud.google.com/bigquery), navigate to the dataset created by the extension. The extension named your dataset using the Dataset ID that you specified during extension installation, `${DATASET_ID}`.
41+
42+
1. From your raw changelog table, run the following query:
43+
44+
```
45+
SELECT COUNT(*) FROM
46+
`${PROJECT_ID}.${COLLECTION_PATH}.${COLLECTION_PATH}_raw_changelog`
47+
WHERE operation = "import"
48+
```
49+
50+
The result set will contain the number of documents in your source collection.

0 commit comments

Comments
 (0)