Skip to content

sanjukhetavath/snowflake-devops-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake DevOps Framework

Table of Contents

  1. Overview
    1. SDLC Workflow
  2. Workspaces
    1. Relationship to Snowflake Accounts
    2. Creating Workspaces
    3. Seeding Workspaces
  3. Branching Strategy
  4. CI/CD
    1. Pipelines
    2. Settings
  5. Scripts
    1. Scripts Overview
    2. Chatops Bots
  6. Setup
    1. Prerequisites
    2. Snowflake Setup
    3. Azure Setup
    4. Terraform Setup
  7. Development
    1. Local Setup
  8. Todos
  9. Maintainers
  10. Legal

Overview

The purpose of this framework is to provide you with a complete end-to-end DevOps accelerator for Snowflake.

Please note that this Snowflake DevOps Framework is a community-developed tool, not an official Snowflake offering. It comes with no support or warranty.

SDLC Workflow

TODO: Need a good breakdown and diagram here.

Workspaces

A workspace is the combination of two things: an identifier (component, business subject area, etc.) and an SDLC environment (dev, test, prod, etc.). This corresponds to how Terraform understands workspaces, see One Workspace Per Environment Per Terraform Configuration for more details.

Relationship to Snowflake Accounts

This framework is built around using a single Snowflake account for all environments, as opposed to having a seperate Snowflake account per environment. We decided to start here since it's the most simple and is what most Snowflake customers are doing today. But there are situations where having separate Snowflake accounts per environemt is required or beneficial, and in those cases this framework should be able to be easily adjusted to support.

TOOD: Add diagram

Creating Workspaces

Seeding Workspaces

Branching Strategy

There are many different branching strategies to choose from, but we have decided to adopt Microsoft's Release Flow branching strategy. The other popular approach today is GitHub's GitHub Flow branching strategy. Both strategies are great, and both follow a Trunk Based Development approach which addresses the challenges of long-lived branches and the resulting merge hell. But we feel that Microsoft's Release Flow provides a little more control over release deployment (by not continuously deploying master to production) and aligns a little better with data engineering style releases. At a high level, here are the steps involved:

  • The trunk (main or master) is buildable at all times
  • All development happens in either topic or hotfix short-lived branches
  • All changes in topic or hotfix branches go through code review in a Pull Request (PR)
  • Once a topic or hotfix branch is ready to go and has passed all tests the code is immediately merged to the trunk and the branch can be deleted
  • Deployments are made to production at the end of each sprint (always from a release branch)
  • At the end of each sprint a release branch is created, which only lives during the lifetime of the following sprint
  • All deployments to production happen from the release branch, not the trunk
  • Hotfixes, like everything else, are made first to a short-lived branch, then merged to the trunk, then cherry picked to the release branch and deployed
  • Old release branches can be deleted

And here is a graphical representation from Microsoft's Release Flow page:

Here are a few additional resources to help you understand the Release Flow approach:

CI/CD

Pipelines

Settings

Scripts

Scripts

Chatops Bots

Setup

This section will help you get things setup for the Framework.

Prerequisites

  1. Snowflake
    1. A Snowflake Account.
    2. A Snowflake User with ACCOUNTADMIN privileges.
  2. GitHub
    1. A GitHub Account. If you don’t already have a GitHub account you can create one for free. Visit the Join GitHub page to get started.
    2. A GitHub Repository. If you don't already have a repository created, or would like to create a new one, then Create a new respository. For the type, select Public (although you could use either). And you can skip adding the README, .gitignore and license for now.
  3. Azure
    1. An Azure Account with privileges to create resources.
    2. The az CLI installed on your computer. If you don't already have the Azure CLI installed, see the details in their How to install the Azure CLI page.
  4. Terraform
    1. Terraform CLI installed on your computer. If you don't already have the Terraform CLI installed, see the details on their Install Terraform page.
  5. Integrated Development Environment (IDE)
    1. Your favorite IDE with Git integration. If you don’t already have a favorite IDE that integrates with Git I would recommend the great, free, open-source Visual Studio Code.
    2. Your project repository cloned to your computer. For connection details about your Git repository, open the Repository and copy the HTTPS link provided near the top of the page. If you have at least one file in your repository then click on the green Code icon near the top of the page and copy the HTTPS link. Use that link in VS Code or your favorite IDE to clone the repo to your computer.

Snowflake Setup

For this Framework you will need to create a service account for Terraform with the appropriate permissions (named SDF_TERRAFORM_SVC). We will follow the principles of seperation of duties and least privilege here. The Terraform service account will also be configured to use Key Pair Authentication with encrypted keys.

Let's begin by creating a role in Snowflake for our Terraform service account along with the required permissions. To start, log in to your Snowflake account and run the following command as the ACCOUNTADMIN role:

USE ROLE ACCOUNTADMIN;

CREATE ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE ROLE ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT MANAGE GRANTS ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE SDF_TERRAFORM_ROLE;

Next we will generate the encrypted private and public keys for our service account. Open a terminal window and run the following:

openssl genrsa -out snowflake_key 4096
openssl rsa -in snowflake_key -pubout -out snowflake_key.pub
openssl pkcs8 -topk8 -inform pem -in snowflake_key -outform PEM -v2 aes-256-cbc -out snowflake_key.p8

Once completed, you will have an encrypted private key file named snowflake_key.p8 in the PEM format and a public key file named snowflake_key.pub. Copy the public and private key files to a local directory for storage. Record the path to the files. Note that the private key is stored using the PKCS#8 (Public Key Cryptography Standards) format and is encrypted using the passphrase you specified in the previous step.

Next, login to your Snowflake account and run the following command as the ACCOUNTADMIN role. Please replace the <RSA PUBLIC KEY HERE> placeholder with your public key (don't include the BEGIN and END comment lines but do included the newlines in the body of the public key):

CREATE USER SDF_TERRAFORM_SVC RSA_PUBLIC_KEY='<RSA PUBLIC KEY HERE>' DEFAULT_ROLE=SDF_TERRAFORM_ROLE MUST_CHANGE_PASSWORD=FALSE;
GRANT ROLE SDF_TERRAFORM_ROLE TO USER SDF_TERRAFORM_SVC;

Finally, test your new service account by connecting to Snowflake via snowsql with the following command in your terminal window (replacing the <> placeholder values with your real values):

snowsql -a <account_identifier> -u SDF_TERRAFORM_SVC --private-key-path <path>/snowflake_key.p8

Azure Setup

Terraform will use Azure storage as a backend to store the Terraform state files (more details below in Terraform Setup). In order to allow Terraform to connect while running in an automated CI/CD pipeline we'll use an Azure service principal to connect. Please follow these steps to create the service principal:

For more details, please also see Terraform's azurerm Azure Provider: Authenticating using a Service Principal with a Client Secret.

Please note that according to Microsoft's documentation above, "the Contributor role is the default role and has full permissions to read and write to an Azure account. For this article, a service principal with a Contributor role is being used. For more information about Role-Based Access Control (RBAC) and roles, see RBAC: Built-in roles."

TODO: How to limit access for service principal?

Terraform Setup

Terraform depends on a state file to track the state of the resources/objects being managed. Many declarative style tools like this will do a real-time comparison between the objects defined in code and the deployed objects and then figure out what changes are required. But Terraform does not operate in this manner, instead it maintains a state file to keep track of things. See Terraform's overview of State and in particular their discussion of why they chose to require a State file in Purpose of Terraform State.

State files in Terraform introduce a few challenges, the most significant is that the State file can get out of sync with the actual deployed objects. This will happen if you use a different process/tool than Terraform to update any deployed object (including making manual changes to a deployed object). The State file can also get out of sync (or corrupted) when multiple developers/process are trying to access it at the same time. The way to safeguard against those challenges is to always use the same process (Terraform) to manage the same set of objects and to use Terraform's Remote State capability to store the state files in a remote backend. To see a list of which backends are supported check out Terraform's Backends page (and the "Standard Backends" in the left navigation of that page). For this framework we will be using the Azure backend azurerm. The steps below will walk you through setting this up, but please also see these pages for more details:

To create the Azure storage account (which will be used to store the Terraform state files), we will use Terraform! The resource definitions can be found in the setup/azure-remote-storage.tf script in this repository. From a terminal window, run the following:

az login
cd setup
terraform init
terraform apply

TODO: Incorporate best practices for locking down these new Azure resources (storage account)

Since the setup/azure-remote-storage.tf script created a storage account with a random 5 character suffix you will need to lookup the storage account name in the Azure Portal.

Development

Local Setup

In order to run the framework locally during development, please follow these steps. First, set the following environment variables:

export ARM_CLIENT_ID="client-id"
export ARM_CLIENT_SECRET="client-secret"
export ARM_SUBSCRIPTION_ID="subscription-id"
export ARM_TENANT_ID="tenant-id"

export SNOWFLAKE_ACCOUNT="account"
export SNOWFLAKE_REGION="region"
export SNOWFLAKE_USER="user"
export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE="passphrase"
export SNOWFLAKE_PRIVATE_KEY_PATH="path-to-file"

For development you can use the SNOWFLAKE_PRIVATE_KEY_PATH to store the path to your private key. When running in an automated way through your CI/CD pipeline you'll likely want to use the SNOWFLAKE_PRIVATE_KEY environment variable instead (and store the actual key value there).

To login with the service account, you can use snowsql like this:

snowsql -a <account_identifier> -u SDF_TERRAFORM_SVC --private-key-path <path>/snowflake_key.p8

Todos

  • Environements
  • CI/CD Workflows
    • Add YAML pipelines for other CI/CD tools
  • Scripts
    • Move the scripts into a Chatops Bot like Hubot, Lita, or Err
  • Other
    • Add support for dbt

Maintainers

  • Jeremiah Hansen (@sfc-gh-jhansen)

This is a community-developed tool, not an official Snowflake offering. It comes with no support or warranty. However, feel free to raise a GitHub Issue if you find a bug or would like a new feature.

Legal

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this tool except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

An end-to-end DevOps accelerator framework for Snowflake

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages