Skip to content

Infrastructure-as-Code (IaC) project that provisions a foundational data warehouse environment on Google Cloud Platform using Terraform. Includes a BigQuery dataset and a Cloud Storage bucket, ready for integration with analytics tools like Dataiku or custom ETL pipelines.

License

Notifications You must be signed in to change notification settings

Peippo1/gcp-datawarehouse-terraform

Repository files navigation

Terraform Terraform CI

GCP Data Warehouse Terraform Project

This project provisions core infrastructure for a simple data warehouse setup in Google Cloud using Terraform.

✅ What it Deploys

  • A BigQuery dataset named google_trends
  • A Google Cloud Storage bucket (with randomized suffix) for staging data
  • (Temporarily decommissioned) A Compute Engine VM (linux-admin-vm) configured with:
    • Debian 11
    • UFW firewall enabled (allowing SSH and Dataiku port)
    • A dedicated dataiku user for installing and running Dataiku DSS
  • IAM roles for secure access management

🌍 Deployment Context

  • Region: europe-west2 (London)
  • Designed to run with gcloud Application Default Credentials
  • Terraform manages infrastructure provisioning and IAM bindings

🚀 Getting Started

  1. Authenticate with GCP:

    gcloud auth application-default login
  2. Initialize Terraform:

    terraform init
  3. Review the plan:

    terraform plan
  4. Apply the deployment:

    terraform apply
  5. View outputs:

    terraform output

🗃️ Notes

  • The bucket is uniquely named using a random suffix to avoid naming conflicts.
  • This project uses a simple dataset ID google_trends for demonstration purposes.
  • UFW firewall is configured on the VM to restrict access, allowing only SSH by default.
  • The VM is intended to be used for installing Dataiku DSS and related workloads.
  • The VM has been torn down to minimize cloud costs. You can re-provision it using Terraform when needed.
  • You can modify variables in terraform.tfvars.

This infrastructure forms the foundation for GCP-based analytics workflows and can be extended with BigQuery tables, scheduled queries, Dataiku integration, and more.

Architecture Overview

graph TD
    Terraform[Terraform Configuration]
    BigQuery[BigQuery Dataset\ngoogle_trends]
    GCS[Cloud Storage Bucket\nStaging Bucket]
    ComputeVM[Compute Engine VM\nlinux-admin-vm]

    Terraform --> BigQuery
    Terraform --> GCS
    Terraform --> ComputeVM
Loading

⚙️ CI/CD Automation

This project includes a GitHub Actions workflow that performs the following on every pull request to master:

  • Checks Terraform formatting (terraform fmt)
  • Initializes the Terraform working directory (terraform init)
  • Validates configuration syntax (terraform validate)
  • Generates a plan to show what changes will be made (terraform plan)

It securely authenticates with GCP using a service account stored as a GitHub secret (GCP_CREDENTIALS), ensuring that infrastructure code is validated continuously without exposing sensitive credentials.

About

Infrastructure-as-Code (IaC) project that provisions a foundational data warehouse environment on Google Cloud Platform using Terraform. Includes a BigQuery dataset and a Cloud Storage bucket, ready for integration with analytics tools like Dataiku or custom ETL pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •