Skip to content
This repository was archived by the owner on Sep 26, 2023. It is now read-only.
Alexander Dean edited this page Jun 27, 2017 · 11 revisions

Welcome to the snowplow-snowflake-loader wiki!

Overview

This is a project to load Snowplow enriched events into the Snowflake cloud data warehouse.

Technical architecture

This application consists of two independent apps:

  1. Snowplow Snowflake Transformer, a Spark job which reads Snowplow enriched events from Amazon S3 and writes them back into S3 in a format ready for Snowflake
  2. Snowplow Snowflake Loader, a Scala app which takes the Snowflake-ready data in Redshift and loads it into Snowflake

Snowplow Snowflake Transformer

This is a Spark job written in Scala, and making use of the Snowplow Scala Analytics SDK.

The Transformer:

  • Reads Snowplow enriched events from S3
  • Uses the JSON transformer from the Snowplow Scala Analytics SDK to convert those enriched events into JSONs
  • Writes those enriched event JSONs back to S3 as newline-delimited gzipped files
  • Keeps track of which folders it has processed using the Snowplow Scala Analytics SDK's DynamoDB manifest functionality

Snowplow Snowflake Loader

This is a Scala app. Description to come.

Clone this wiki locally