Data Engineering Challenge

Description

This project demonstrates a basic data pipeline that ingests, transforms, and analyzes user event logs from a JSON source. The pipeline is built using Python (pandas) and stores the normalized data in a relational SQLite database.

Project Structure

mock_event_logs.json: input data with raw event logs
data_eng_challenge.ipynb: the complete ETL pipeline and analytics queries
erd.png: ERD diagram of the final relational data model
analytics_queries.sql: file with SQL-only versions of the analytics queries

Requirements

Jupyter Notebook or Google Colab
Python 3.10+

How the project works

I used pandas to process data about users, documents, and events. First, I cleaned and transformed the data to ensure accuracy and consistency. Then, I saved everything into a SQLite database, which is lightweight and easy to use without needing extra software setup. The database contains three main tables: users, documents, and events. These tables are linked through primary keys, allowing me to run queries that connect users with documents and their actions.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Data Engineer Take Home Challenge.pdf		Data Engineer Take Home Challenge.pdf
README.md		README.md
analytics_queries.sql		analytics_queries.sql
data_eng_challenge.ipynb		data_eng_challenge.ipynb
erd.png		erd.png
mock_event_logs.json		mock_event_logs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering Challenge

Description

Project Structure

Requirements

How the project works

About

Uh oh!

Releases

Packages

Languages

mattcattaneo21/data-eng-challenge

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Challenge

Description

Project Structure

Requirements

How the project works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages