Skip to content

legomb/fedex-assignment

Repository files navigation

FedEx Analytics Engineering Assignment

Overview

This is my submission for the FedEx Analytics Engineering Assignment.

It features a contained environment with a data pipeline that ingests, cleans, and enriches the Amazon E-Commerce Sales Dataset from Kaggle, and makes the results available for BI.

flowchart LR
    raw["`Raw data
    (.csv file)`"]
    clean["Clean models
    (dbt)"]
    enriched["Enriched models
    (dbt)"]
    kimball["Kimball models
    (dbt)"]
    SemanticLayer["Semantic Layer
    (Cube.dev)"]
    BI["BI layer
    (Apache Superset)"]

    raw --> clean --> enriched --> kimball --> SemanticLayer --> BI
Loading

Components

This project includes a workflow with:

  • Data transformations using dbt
  • Data storage using DuckDB
  • Semantic Layer models using cube.dev
  • BI dashboards using Superset
  • A basic data catalog using dbt docs
  • A local development environment using vscode devcontainer, linters, docker compose.

Out of scope

Due to time constraints, the following areas are incomplete/out of scope:

  • Proper security handling for production, like not committing the .env file, using secrets, etc. (.env file is commited for demo purposes.)
  • Superset works and has a connection to cube, so it can be used to create dashboards. But there are no readymade dashboards included in this repo.
  • Devcontainer linters are not configured.
  • Limited data cleansing and testing.
  • The Pyspark part of this exercise was agreed to be skipped.

Quick reference

  • REQUIREMENTS.md: Original requirements.
  • transform/models: Data transformation models (dbt).
  • cube/schema: Semantic Layer models, to be used by BI dashboard apps (Cube.dev)
  • superset: Superset (BI dashboards)
  • docker-compose.yml: Local environment definition.
  • taskfile.yml: Available actions, to be used by maintainers and eventually the CI/CD.

How to run this demo

Requirements

  • Visual Studio Code
  • Docker

Instructions

  1. Open this repo in VSCode. Open the command palette (Shift+Cmd+P on mac) and select Dev Containers: Rebuild and Reopen in Container. This will spin up the environment including a devcontainer, cube, and superset.

  2. Open a terminal in the devcontainer and run:

    task demo:run-full-demo
  3. Then:

    • To see an overview of the data transformation models and their metadata & lineage, access the local dbt docs instance by navigating to http://localhost:8080.
    • To view and manage the semantic model data cubes and views, open the local cube instance by navigating to http://localhost:4000/.
    • To view and manage BI dashboards, open the local Superset instance by navigating to http://localhost:8088/login/ and log in with admin, admin. It has a connection to cube and you can create your own dashboards, but at the moment there are no readymade dashboards included in this repo.