This project provisions core infrastructure for a simple data warehouse setup in Google Cloud using Terraform.
- A BigQuery dataset named
google_trends
- A Google Cloud Storage bucket (with randomized suffix) for staging data
- (Temporarily decommissioned) A Compute Engine VM (
linux-admin-vm
) configured with:- Debian 11
- UFW firewall enabled (allowing SSH and Dataiku port)
- A dedicated
dataiku
user for installing and running Dataiku DSS
- IAM roles for secure access management
- Region:
europe-west2
(London) - Designed to run with
gcloud
Application Default Credentials - Terraform manages infrastructure provisioning and IAM bindings
-
Authenticate with GCP:
gcloud auth application-default login
-
Initialize Terraform:
terraform init
-
Review the plan:
terraform plan
-
Apply the deployment:
terraform apply
-
View outputs:
terraform output
- The bucket is uniquely named using a random suffix to avoid naming conflicts.
- This project uses a simple dataset ID
google_trends
for demonstration purposes. - UFW firewall is configured on the VM to restrict access, allowing only SSH by default.
- The VM is intended to be used for installing Dataiku DSS and related workloads.
- The VM has been torn down to minimize cloud costs. You can re-provision it using Terraform when needed.
- You can modify variables in
terraform.tfvars
.
This infrastructure forms the foundation for GCP-based analytics workflows and can be extended with BigQuery tables, scheduled queries, Dataiku integration, and more.
graph TD
Terraform[Terraform Configuration]
BigQuery[BigQuery Dataset\ngoogle_trends]
GCS[Cloud Storage Bucket\nStaging Bucket]
ComputeVM[Compute Engine VM\nlinux-admin-vm]
Terraform --> BigQuery
Terraform --> GCS
Terraform --> ComputeVM
This project includes a GitHub Actions workflow that performs the following on every pull request to master
:
- Checks Terraform formatting (
terraform fmt
) - Initializes the Terraform working directory (
terraform init
) - Validates configuration syntax (
terraform validate
) - Generates a plan to show what changes will be made (
terraform plan
)
It securely authenticates with GCP using a service account stored as a GitHub secret (GCP_CREDENTIALS
), ensuring that infrastructure code is validated continuously without exposing sensitive credentials.