-
Notifications
You must be signed in to change notification settings - Fork 0
Description
We added experimental code in 376df06 to create the GCP Carbon Footprint Export dataset and a data transfer job to pull the data into a big query dataset.
Unfortunately, the permission model used by GCP Big Query transfers is a bit twisted. Here's what I could gather from GCP's documentation
- GCP uses a managed service SA to run BigQuery Data Transfers e.g.
service-${projectnumber}@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com
- this managed service SA does not naturally have permission to access the target dataset, so it needs to be authorized either to act as a user or service account that has access
- by default the GCP Console will choose user authorization. This means that users will be prompted to allow Big Query access to their account (which works only in the browser). The big query transfer will run as this user
- some data sources support running as a service account, however the Cloud Carbon Footprint Export does not
We can control with terraform how the transfer config gets created
- by default the
resource google_bigquery_data_transfer_config
uses the authenticated principal of the google provider - when setting
resource google_bigquery_data_transfer_config
offers theservice_account_name
the transfer config uses the explicitly configured service account
Now, when we tested this module we always ran it as users that already performed a manual setup of the big query data transfer, i.e. who already authorized the big query service access to their account. In this case, provisioning from terraform succeeded. hashicorp/terraform-provider-google#4449 describes a similar case.
However, the module now had two important failure modes that are bad for the first time experience as they add unpredictability
- as an operator of the module I'm not made explicitly aware that the code that we provision will keep using my user account as part of the transfer config
- provisioning the transfer config may fail, with non-descript errors like
module.meshplatform.module.carbon_export[0].google_bigquery_data_transfer_config.carbon_footprint_transfer_config Creating...
│ Error: Error creating Config: googleapi: Error 400: Request contains an invalid argument.
- we cannot deploy terraform-gcp-meshplatform when the google provider is configured to use a service account, which is common for customers deploying this module from CI/CD
I looked into the alternative of providing a separate google provider via configuration_aliases
but ultimately that adds complexity to every consumer of the terraform-gcp-meshplatform - even if they're not using the carbon footprint export. (Sidenote: i could not figure out how to make an "optional provider" which could solve this problem). Furthermore, there's already no terraform export for setting up the GCP billing export and operators have to set it up manually anyway. I thus feel its best to keep this consistent and require manual steps to set up the billing and carbon footprint exports.