The purpose of this framework is to provide you with a complete end-to-end DevOps accelerator for Snowflake.
Please note that this Snowflake DevOps Framework is a community-developed tool, not an official Snowflake offering. It comes with no support or warranty.
TODO: Need a good breakdown and diagram here.
A workspace is the combination of two things: an identifier (component, business subject area, etc.) and an SDLC environment (dev, test, prod, etc.). This corresponds to how Terraform understands workspaces, see One Workspace Per Environment Per Terraform Configuration for more details.
This framework is built around using a single Snowflake account for all environments, as opposed to having a seperate Snowflake account per environment. We decided to start here since it's the most simple and is what most Snowflake customers are doing today. But there are situations where having separate Snowflake accounts per environemt is required or beneficial, and in those cases this framework should be able to be easily adjusted to support.
TOOD: Add diagram
There are many different branching strategies to choose from, but we have decided to adopt Microsoft's Release Flow branching strategy. The other popular approach today is GitHub's GitHub Flow branching strategy. Both strategies are great, and both follow a Trunk Based Development approach which addresses the challenges of long-lived branches and the resulting merge hell. But we feel that Microsoft's Release Flow provides a little more control over release deployment (by not continuously deploying master to production) and aligns a little better with data engineering style releases. At a high level, here are the steps involved:
- The trunk (main or master) is buildable at all times
- All development happens in either topic or hotfix short-lived branches
- All changes in topic or hotfix branches go through code review in a Pull Request (PR)
- Once a topic or hotfix branch is ready to go and has passed all tests the code is immediately merged to the trunk and the branch can be deleted
- Deployments are made to production at the end of each sprint (always from a release branch)
- At the end of each sprint a release branch is created, which only lives during the lifetime of the following sprint
- All deployments to production happen from the release branch, not the trunk
- Hotfixes, like everything else, are made first to a short-lived branch, then merged to the trunk, then cherry picked to the release branch and deployed
- Old release branches can be deleted
And here is a graphical representation from Microsoft's Release Flow page:
Here are a few additional resources to help you understand the Release Flow approach:
- Release Flow: How We Do Branching on the VSTS Team
- How Microsoft develops modern software with DevOps
- Adopt a Git branching strategy
This section will help you get things setup for the Framework.
- Snowflake
- A Snowflake Account.
- A Snowflake User with
ACCOUNTADMIN
privileges.
- GitHub
- A GitHub Account. If you don’t already have a GitHub account you can create one for free. Visit the Join GitHub page to get started.
- A GitHub Repository. If you don't already have a repository created, or would like to create a new one, then Create a new respository. For the type, select
Public
(although you could use either). And you can skip adding the README, .gitignore and license for now.
- Azure
- An Azure Account with privileges to create resources.
- The az CLI installed on your computer. If you don't already have the Azure CLI installed, see the details in their How to install the Azure CLI page.
- Terraform
- Terraform CLI installed on your computer. If you don't already have the Terraform CLI installed, see the details on their Install Terraform page.
- Integrated Development Environment (IDE)
- Your favorite IDE with Git integration. If you don’t already have a favorite IDE that integrates with Git I would recommend the great, free, open-source Visual Studio Code.
- Your project repository cloned to your computer. For connection details about your Git repository, open the Repository and copy the
HTTPS
link provided near the top of the page. If you have at least one file in your repository then click on the greenCode
icon near the top of the page and copy theHTTPS
link. Use that link in VS Code or your favorite IDE to clone the repo to your computer.
For this Framework you will need to create a service account for Terraform with the appropriate permissions (named SDF_TERRAFORM_SVC
). We will follow the principles of seperation of duties and least privilege here. The Terraform service account will also be configured to use Key Pair Authentication with encrypted keys.
Let's begin by creating a role in Snowflake for our Terraform service account along with the required permissions. To start, log in to your Snowflake account and run the following command as the ACCOUNTADMIN
role:
USE ROLE ACCOUNTADMIN;
CREATE ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE ROLE ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT MANAGE GRANTS ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
Next we will generate the encrypted private and public keys for our service account. Open a terminal window and run the following:
openssl genrsa -out snowflake_key 4096
openssl rsa -in snowflake_key -pubout -out snowflake_key.pub
openssl pkcs8 -topk8 -inform pem -in snowflake_key -outform PEM -v2 aes-256-cbc -out snowflake_key.p8
Once completed, you will have an encrypted private key file named snowflake_key.p8
in the PEM format and a public key file named snowflake_key.pub
. Copy the public and private key files to a local directory for storage. Record the path to the files. Note that the private key is stored using the PKCS#8 (Public Key Cryptography Standards) format and is encrypted using the passphrase you specified in the previous step.
Next, login to your Snowflake account and run the following command as the ACCOUNTADMIN
role. Please replace the <RSA PUBLIC KEY HERE>
placeholder with your public key (don't include the BEGIN and END comment lines but do included the newlines in the body of the public key):
CREATE USER SDF_TERRAFORM_SVC RSA_PUBLIC_KEY='<RSA PUBLIC KEY HERE>' DEFAULT_ROLE=SDF_TERRAFORM_ROLE MUST_CHANGE_PASSWORD=FALSE;
GRANT ROLE SDF_TERRAFORM_ROLE TO USER SDF_TERRAFORM_SVC;
Finally, test your new service account by connecting to Snowflake via snowsql
with the following command in your terminal window (replacing the <>
placeholder values with your real values):
snowsql -a <account_identifier> -u SDF_TERRAFORM_SVC --private-key-path <path>/snowflake_key.p8
Terraform will use Azure storage as a backend to store the Terraform state files (more details below in Terraform Setup). In order to allow Terraform to connect while running in an automated CI/CD pipeline we'll use an Azure service principal to connect. Please follow these steps to create the service principal:
- Authenticate Terraform to Azure. Follow the steps unde Create a service principal
For more details, please also see Terraform's azurerm Azure Provider: Authenticating using a Service Principal with a Client Secret.
Please note that according to Microsoft's documentation above, "the Contributor role is the default role and has full permissions to read and write to an Azure account. For this article, a service principal with a Contributor role is being used. For more information about Role-Based Access Control (RBAC) and roles, see RBAC: Built-in roles."
TODO: How to limit access for service principal?
Terraform depends on a state file to track the state of the resources/objects being managed. Many declarative style tools like this will do a real-time comparison between the objects defined in code and the deployed objects and then figure out what changes are required. But Terraform does not operate in this manner, instead it maintains a state file to keep track of things. See Terraform's overview of State and in particular their discussion of why they chose to require a State file in Purpose of Terraform State.
State files in Terraform introduce a few challenges, the most significant is that the State file can get out of sync with the actual deployed objects. This will happen if you use a different process/tool than Terraform to update any deployed object (including making manual changes to a deployed object). The State file can also get out of sync (or corrupted) when multiple developers/process are trying to access it at the same time. The way to safeguard against those challenges is to always use the same process (Terraform) to manage the same set of objects and to use Terraform's Remote State capability to store the state files in a remote backend. To see a list of which backends are supported check out Terraform's Backends page (and the "Standard Backends" in the left navigation of that page). For this framework we will be using the Azure backend azurerm
. The steps below will walk you through setting this up, but please also see these pages for more details:
To create the Azure storage account (which will be used to store the Terraform state files), we will use Terraform! The resource definitions can be found in the setup/azure-remote-storage.tf
script in this repository. From a terminal window, run the following:
az login
cd setup
terraform init
terraform apply
TODO: Incorporate best practices for locking down these new Azure resources (storage account)
Since the setup/azure-remote-storage.tf
script created a storage account with a random 5 character suffix you will need to lookup the storage account name in the Azure Portal.
In order to run the framework locally during development, please follow these steps. First, set the following environment variables:
export ARM_CLIENT_ID="client-id"
export ARM_CLIENT_SECRET="client-secret"
export ARM_SUBSCRIPTION_ID="subscription-id"
export ARM_TENANT_ID="tenant-id"
export SNOWFLAKE_ACCOUNT="account"
export SNOWFLAKE_REGION="region"
export SNOWFLAKE_USER="user"
export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE="passphrase"
export SNOWFLAKE_PRIVATE_KEY_PATH="path-to-file"
For development you can use the SNOWFLAKE_PRIVATE_KEY_PATH
to store the path to your private key. When running in an automated way through your CI/CD pipeline you'll likely want to use the SNOWFLAKE_PRIVATE_KEY
environment variable instead (and store the actual key value there).
To login with the service account, you can use snowsql like this:
snowsql -a <account_identifier> -u SDF_TERRAFORM_SVC --private-key-path <path>/snowflake_key.p8
- Environements
- Create seeding script/tool
- Move Terraform scripts into a module like the Imuta Fast Data Warehouse
- CI/CD Workflows
- Add YAML pipelines for other CI/CD tools
- Scripts
- Other
- Add support for dbt
- Jeremiah Hansen (@sfc-gh-jhansen)
This is a community-developed tool, not an official Snowflake offering. It comes with no support or warranty. However, feel free to raise a GitHub Issue if you find a bug or would like a new feature.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this tool except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.