Skip to content

Commit ebd59ff

Browse files
authored
Merge pull request #422 from cagov/365_update_dbt_docs
added Snowflake OAuth instructions and fixed many case, spelling, and grammars errors
2 parents 1ee6a1e + 4f84947 commit ebd59ff

File tree

13 files changed

+89
-94
lines changed

13 files changed

+89
-94
lines changed

docs/code/local-setup.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ We use [Terraform](https://www.terraform.io/) to manage infrastructure.
4848
Dependencies for Terraform (mostly in the [go ecosystem](https://go.dev/))
4949
can be installed via a number of different package managers.
5050

51-
If you are running Mac OS, you can install you can install these dependencies with [Homebrew](https://brew.sh/).
51+
If you are running Mac OS, you can install these dependencies with [Homebrew](https://brew.sh/).
5252
First, install Homebrew
5353

5454
```bash
@@ -89,7 +89,7 @@ export SNOWFLAKE_WAREHOUSE=TRANSFORMING_XS_DEV
8989
export SNOWFLAKE_AUTHENTICATOR=ExternalBrowser
9090
```
9191

92-
This will enable you to perform transforming activities which is needed for dbt.
92+
This will enable you to perform transforming activities which are needed for dbt.
9393
Open a new terminal and verify that the environment variables are set.
9494

9595
**Switch to Loader role**
@@ -104,7 +104,7 @@ export SNOWFLAKE_WAREHOUSE=LOADING_XS_DEV
104104
export SNOWFLAKE_AUTHENTICATOR=ExternalBrowser
105105
```
106106

107-
This will enable you to perform loading activities and is needed to which is needed for Airflow or Fivetran.
107+
This will enable you to perform loading activities which are needed for Airflow or Fivetran.
108108
Again, open a new terminal and verify that the environment variables are set.
109109

110110
## Configure AWS (optional)

docs/dbt/dbt-performance.md

Lines changed: 28 additions & 38 deletions
Large diffs are not rendered by default.

docs/infra/architecture.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,15 +131,15 @@ There are six primary functional roles:
131131

132132
## Reporting and analysis
133133

134-
The most prominent consumer of the data products from this project are PowerBI and Tableau dashboards and the CalInnovate team.
134+
The most prominent consumers of the data products from this project are PowerBI and Tableau dashboards and the CalInnovate team.
135135

136136
## Custom schema names
137137

138138
dbt's default method for generating [custom schema names](https://docs.getdbt.com/docs/build/custom-schemas)
139139
works well for a single-database setup:
140140

141141
* It allows development work to occur in a separate schema from production models.
142-
* It allows analytics engineers to develop side-by-side without stepping on each others toes.
142+
* It allows analytics engineers to develop side-by-side without stepping on each other's toes.
143143

144144
A downside of the default is that production models all get a prefix,
145145
which may not be an ideal naming convention for end-users.
@@ -169,7 +169,7 @@ So your data access looks like the following:
169169

170170
![developer](../images/developer.png)
171171

172-
Now let's consider the nigthly production build. This service account builds the production models
172+
Now let's consider the nightly production build. This service account builds the production models
173173
in `TRANSFORM_PRD` and `ANALYTICS_PRD` based on the raw data in `RAW_PRD`.
174174
The development environment effectively doesn't exist to this account, and data access looks like the following:
175175

docs/infra/cloud-infrastructure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Cloud Infrastructure
1+
# Cloud infrastructure
22

33
The DSE team [uses Terraform](../code/terraform-local-setup.md) to manage cloud infrastructure.
44
Our stack includes:

docs/infra/snowflake.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,15 +82,15 @@ X-Large: Powerful for demanding workloads and data-intensive operations.
8282
3X-Large: Triple the capacity of X-Large.
8383
4X-Large: Quadruple the capacity of X-Large.
8484

85-
1. **`LOADING_{size}_{env}`**: These warehouse is for loading data to `RAW`.
85+
1. **`LOADING_{size}_{env}`**: This warehouse is for loading data to `RAW`.
8686
1. **`TRANSFORMING_{size}_{env}`**: This warehouse is for transforming data in `TRANSFORM` and `ANALYTICS`.
8787
1. **`REPORTING_{size}_{env}`**: This warehouse is the role for BI tools and other end-users of the data.
8888

8989
### Four roles
9090

9191
There are four primary functional roles:
9292

93-
1. **`LOADER_{env}`**: This role is for tooling like Fivetran or Airflow to load raw data in to the `RAW` database.
93+
1. **`LOADER_{env}`**: This role is for tooling like Fivetran or Airflow to load raw data into the `RAW` database.
9494
1. **`TRANSFORMER_{env}`**: This is the analytics engineer/dbt role, for transforming raw data into something analysis-ready. It has read/write/control access to both `TRANSFORM` and `ANALYTICS`, and read access to `RAW`.
9595
1. **`REPORTER_{env}`**: This role read access to `ANALYTICS`, and is intended for BI tools and other end-users of the data.
9696
1. **`READER_{env}`**: This role has read access to all three databases, and is intended for CI service accounts to generate documentation.

docs/learning/cloud-data-warehouses.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Cloud Data Warehouses
1+
# Cloud data warehouses
22

33
## What is a cloud data warehouse?
44

docs/learning/git.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Git and GitHub
1+
# git and GitHub
22

33
## What are git and GitHub?
44

@@ -42,7 +42,7 @@ for learning to use its various features, including:
4242
* [How to review pull requests](https://github.com/skills/review-pull-requests)
4343
* [How to automate tasks and use CI/CD with GitHub actions](https://github.com/skills/hello-github-actions)
4444

45-
## Git+GitHub at CalData
45+
## git and GitHub at CalData
4646

4747
On the CalData Data Services and Engineering team we make heavy use of git and GitHub for our projects,
4848
and have our own set of [guidelines and best practices](../code/code-review.md) for code review.

docs/learning/glossary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Modern Data Stack Glossary
1+
# Modern data stack glossary
22

33
This glossary is a reference for commonly used acronyms, terms, and tools associated with the modern data stack and data and analytics engineering practices.
44

docs/learning/naming-conventions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Naming Conventions
1+
# Naming conventions
22

33
This page documents the Data Services and Engineering (DSE) team's naming conventions for cloud resources.
44

5-
## General Approach
5+
## General approach
66

77
Our approach is adapted from [this blog post](https://stepan.wtf/cloud-naming-convention/).
88
The goals of establishing a naming convention are:

docs/learning/security.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# Security Guidelines
1+
# Security guidelines
22

33
This document describes security conventions for CalData's Data Services and Engineering team,
44
especially as it relates to cloud and SaaS services.
55

6-
## Cloud Security and IAM
6+
## Cloud security and IAM
77

88
The major public clouds (AWS, GCP, Azure) all have a service for Identity and Access Management (IAM).
99
This allows us to manage which users or services are able to perform actions on which resources.
@@ -17,7 +17,7 @@ In general, IAM is described by:
1717

1818
Most of the work of IAM is managing users, permissions, groups, policies, and roles to perform tasks in a secure way.
1919

20-
### Principle of Least-Privilege
20+
### Principle of least-privilege
2121

2222
In general, users and roles should be assigned permissions according to the
2323
[Principle of Least Privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege),
@@ -48,11 +48,11 @@ Some good practices around the use of service accounts
4848
and being able to edit or decommission accounts separately from each other is a good idea.
4949
* Regularly rotate access keys for long-term service accounts.
5050

51-
### Production and Development Environments
51+
### Production and development environments
5252

5353
Production environments should be treated with greater care than development ones.
5454
In the testing and developing of a service, roles and policies are often crafted
55-
which do not follow the principal of least privilege (i.e., they have too many permissions).
55+
which do not follow the principle of least privilege (i.e., they have too many permissions).
5656

5757
When productionizing a service or application, make sure to review the relevant
5858
roles and service accounts to ensure they only have the necessary policies,
@@ -72,7 +72,7 @@ GCS buckets in a project, when their application only requires access to one.
7272
AWS has a nice [user guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html)
7373
for how to work with IAM, including some [best-practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html).
7474

75-
## Third-party SaaS Integrations
75+
## Third-party SaaS integrations
7676

7777
Often a third-party software-as-a-service (SaaS) provider will require service accounts
7878
to access resources within a cloud account.
@@ -116,6 +116,6 @@ The DSE team both manages CalData projects and onboards clients into Fivetran, a
116116
Security policies for Snowflake can be found
117117
[here](../infra/snowflake.md#security-policies).
118118

119-
## Security Review Standard Practices
119+
## Security review standard practices
120120

121121
(TODO, possibly pulling from [Agile Application Security](https://www.amazon.com/Agile-Application-Security-Enabling-Continuous/dp/1491938846/ref=cm_cr_arp_d_product_top))

0 commit comments

Comments
 (0)