This repository was archived by the owner on Sep 26, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Alexander Dean edited this page Jun 27, 2017
·
11 revisions
Welcome to the snowplow-snowflake-loader wiki!
This is a project to load Snowplow enriched events into the Snowflake cloud data warehouse.
This application consists of two independent apps:
- Snowplow Snowflake Transformer, a Spark job which reads Snowplow enriched events from Amazon S3 and writes them back into S3 in a format ready for Snowflake
- Snowplow Snowflake Loader, a Scala app which takes the Snowflake-ready data in Redshift and loads it into Snowflake
This is a Spark job written in Scala, and making use of the Snowplow Scala Analytics SDK.
The Transformer:
- Reads Snowplow enriched events from S3
- Uses the JSON transformer from the Snowplow Scala Analytics SDK to convert those enriched events into JSONs
- Writes those enriched event JSONs back to S3 as newline-delimited gzipped files
- Keeps track of which folders it has processed using the Snowplow Scala Analytics SDK's DynamoDB manifest functionality
This is a Scala app. Description to come.