FedEx Analytics Engineering Assignment

Overview

This is my submission for the FedEx Analytics Engineering Assignment.

It features a contained environment with a data pipeline that ingests, cleans, and enriches the Amazon E-Commerce Sales Dataset from Kaggle, and makes the results available for BI.

flowchart LR
    raw["`Raw data
    (.csv file)`"]
    clean["Clean models
    (dbt)"]
    enriched["Enriched models
    (dbt)"]
    kimball["Kimball models
    (dbt)"]
    SemanticLayer["Semantic Layer
    (Cube.dev)"]
    BI["BI layer
    (Apache Superset)"]

    raw --> clean --> enriched --> kimball --> SemanticLayer --> BI

Components

This project includes a workflow with:

Data transformations using dbt
Data storage using DuckDB
Semantic Layer models using cube.dev
BI dashboards using Superset
A basic data catalog using dbt docs
A local development environment using vscode devcontainer, linters, docker compose.

Out of scope

Due to time constraints, the following areas are incomplete/out of scope:

Proper security handling for production, like not committing the .env file, using secrets, etc. (.env file is commited for demo purposes.)
Superset works and has a connection to cube, so it can be used to create dashboards. But there are no readymade dashboards included in this repo.
Devcontainer linters are not configured.
Limited data cleansing and testing.
The Pyspark part of this exercise was agreed to be skipped.

Quick reference

REQUIREMENTS.md: Original requirements.
transform/models: Data transformation models (dbt).
cube/schema: Semantic Layer models, to be used by BI dashboard apps (Cube.dev)
superset: Superset (BI dashboards)
docker-compose.yml: Local environment definition.
taskfile.yml: Available actions, to be used by maintainers and eventually the CI/CD.

How to run this demo

Requirements

Visual Studio Code
Docker

Instructions

Open this repo in VSCode. Open the command palette (Shift+Cmd+P on mac) and select Dev Containers: Rebuild and Reopen in Container. This will spin up the environment including a devcontainer, cube, and superset.
Open a terminal in the devcontainer and run:
```
task demo:run-full-demo
```
Then:
- To see an overview of the data transformation models and their metadata & lineage, access the local dbt docs instance by navigating to http://localhost:8080.
- To view and manage the semantic model data cubes and views, open the local cube instance by navigating to http://localhost:4000/.
- To view and manage BI dashboards, open the local Superset instance by navigating to http://localhost:8088/login/ and log in with admin, admin. It has a connection to cube and you can create your own dashboards, but at the moment there are no readymade dashboards included in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.devcontainer		.devcontainer
.vscode		.vscode
cube		cube
superset		superset
transform		transform
.env		.env
.gitignore		.gitignore
.sqlfluff		.sqlfluff
.sqlfluffignore		.sqlfluffignore
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md
docker-compose.yml		docker-compose.yml
taskfile.yml		taskfile.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FedEx Analytics Engineering Assignment

Overview

Components

Out of scope

Quick reference

How to run this demo

Requirements

Instructions

About

Uh oh!

Releases 2

Uh oh!

Languages

legomb/fedex-assignment

Folders and files

Latest commit

History

Repository files navigation

FedEx Analytics Engineering Assignment

Overview

Components

Out of scope

Quick reference

How to run this demo

Requirements

Instructions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Languages