Welcome to the Global Tourism Statistics Data Warehouse repository! 🚀
This project demonstrates the design and implementation of a modern data warehouse using Medallion Architecture (Bronze → Silver → Gold), applied to real-world datasets on global tourism and UN SDG indicators.
It is built to showcase Data Engineering, SQL Development, and Data Analytics skills in a professional, portfolio-ready format.
The architecture follows the Medallion approach:
- Bronze Layer → Raw data ingested from CSV (UN Tourism Statistics + UN SDG datasets).
- Silver Layer → Cleaned, standardized, and normalized data, with null-handling and code mappings.
- Gold Layer → Business-ready data, organized into fact and dimension views (star schema).
The data ingestion and transformation pipeline is structured as follows:
- Sources: UN Tourism Statistics (domestic, inbound, outbound, industries) and SDG indicators (8.9.1, 8.9.2, 12.b.1).
- Bronze Layer: Raw CSV imported into SQL Server with
BULK INSERT. - Silver Layer: Cleaning, transformations, and standardization.
- Gold Layer: Star schema with fact tables (domestic, inbound, outbound, industries, SDG) and dimension tables (country, indicator, year, unit of measure).
- Contains cleaned and standardized tourism and SDG datasets.
- Each dataset retains Country_code, Country, Indicator_code, and Units.
- Provides a unified and analysis-ready structure.
- Star Schema design with Fact and Dimension views:
- Dimensions: Country, Indicator, Year, Unit of Measure.
- Facts: Domestic Tourism, Inbound Tourism, Outbound Tourism, Tourism Industries, SDG indicators.
- Ensures referential integrity and supports analytical queries.
This project involves:
- Data Architecture → Medallion (Bronze, Silver, Gold).
- ETL Pipelines → SQL-based transformations from CSV to Gold.
- Data Modeling → Fact & Dimension views (star schema).
- Data Quality → Foreign key integrity checks and null handling.
- Analytics & Reporting → Business-ready datasets for visualization (Tableau/Power BI).
🎯 Skills demonstrated:
- SQL Server (T-SQL)
- Data Engineering (ETL/ELT)
- Data Modeling (Star Schema)
- Data Quality & Integrity
- Analytics Preparation
- SQL Server (SSMS) → Database & ETL implementation.
- GitHub → Version control & portfolio sharing.
- Draw.io → Diagrams for architecture and modeling.
- Tableau/Power BI (optional) → Visualization and reporting.
- Excel → Preliminary data inspection.
global-tourism-dwh/
│
├── datasets/ # Raw CSV datasets (UNWTO, SDG)
│
├── scripts/ # SQL Scripts (Bronze → Silver → Gold)
│ ├── bronze/ # BULK INSERT raw tables
│ ├── silver/ # Cleaning & transformation scripts
│ ├── gold/ # Fact & Dimension views
│
├── docs/ # Documentation & diagrams
│ ├── Data_Architecture.png
│ ├── Data_flow.png
│ ├── Integration_Model_Silver.png
│ ├── Integration_Model_Gold.png
│
├── tests/ # Foreign key & data integrity checks
│
├── README.md # Project overview (this file)
└── LICENSE # License info
- Conduct Exploratory Data Analysis (EDA) to assess and ensure data quality within the Gold Layer.
The datasets used in this project come from Kaggle:
I have applied minor modifications using Excel (delimiter changes from , to ;) to make them compatible with SQL Server bulk insert operations.
All rights remain with the original dataset authors.
This project is licensed under the MIT License.
Hi, I’m Daniele Amoroso 👋
HR Generalist transitioning into Data Analytics and Data Science, with a focus on SQL, Python, and AI Automation.
Currently transitioning into the data field by building portfolio projects that combine technical skills with analytical thinking.
📌 Connect with me on LinkedIn → Daniele Amoroso]



