Iceberg Table Infrastructure with AWS CDK

This project demonstrates how to define, provision, and manage Apache Iceberg tables in AWS using the AWS Cloud Development Kit (CDK). It creates a scalable, modular data lakehouse foundation with proper infrastructure-as-code practices.

🛠️ This repository was created for demonstration purposes and is part of my engineering portfolio. While it can be adapted for real use cases, it is not actively maintained for production.

🚀 Overview

This repository provisions:

S3 buckets to store Iceberg table data and schema files.
Glue Catalog databases and tables compatible with Iceberg.
Athena queries via custom AWS SDK resources for declarative table creation.
Automatic schema parsing from JSON to Iceberg-compatible SQL.
Parameter storage in AWS SSM for safe retrieval of table details.

It supports flexible schema definitions, optional SQL overrides, and SSM-based configuration for secure and reusable environments.

🧱 Project Structure

iceberg-tables-example/
├── bin/
│   └── createIcebergTables.ts    # CDK entry point to deploy infrastructure
├── lib/
│   ├── interfaces.ts             # TypeScript interfaces for table configuration
│   ├── bucketStack.ts            # CDK stack for creating secure S3 buckets
│   ├── icebergTableStack.ts      # CDK stack for deploying Iceberg tables
│   ├── utils.ts                  # Utilities for schema parsing, SSM access, and validation
│   └── versionedStack.ts         # Base class for versioned CDK stacks
├── data/
│   └── schemas/                  # JSON schema files for Iceberg tables
├── package.json
└── README.md                     # Project documentation (this file)

🔧 Features

Environment-aware deployments via EnvAwareStackProps
Custom SQL support with onCreateQuery, onUpdateQuery, and onDeleteQuery
JSON Schema → SQL column mapping with custom type conversions
SSM-resolved parameters for runtime bucket config and outputs
Partitioned table support for efficient querying
Reusable IAM roles with scoped permissions
Schema upload to S3 for transparency and auditing

📦 Prerequisites

Node.js ≥ 16
AWS CDK v2
AWS credentials with permissions for:
S3
Athena
Glue
SSM
IAM

Install dependencies:

yarn install

🚚 Deploying the Stack

Configure your environment Edit stackProps and environment settings in bin/createIcebergTables.ts.
Add your JSON schema Place your Iceberg-compatible schema in data/schemas/your_table.schema.json.
Define your table properties Adjust or add a new TableBuildProps object in createIcebergTables.ts.
Deploy
```
yarn deploy:dev
```

This will:

Create the bucket
Upload schema to S3
Create an Iceberg table using Athena
Store key table metadata in SSM

🧪 Example Schema Mapping

Here’s an example of a JSON schema-to-Iceberg conversion using the mapping feature:

const mapping = {
  "json_str": {
    "json_map": {
      "type": "map",
      "properties": {
        "key": { "type": "string" },
        "value": { "type": "integer" }
      }
    }
  }
};

This will rename json_str to json_map and convert it to map<string, int> in the resulting SQL schema.

🔍 Outputs

After deployment, the following will be saved in AWS Systems Manager Parameter Store:

Table name
Table ARN
Table S3 location
Output S3 path for Athena
Path to schema in S3

These can be referenced across your infrastructure for consistency.

📋 Local Setup

To spin up a fully local test environment (no real AWS):

Ensure Docker and Docker Compose are installed on your machine.
From the project root, bring up all services:
```
make start
```
This starts two containers:
- localstack: emulates AWS S3, Glue, CloudFormation, IAM, STS
- cdk: runs cdklocal to bootstrap and deploy your CDK stacks into LocalStack
Deploy your CDK stacks locally:
```
make deploy
```
This uses cdklocal to create S3 buckets, Glue databases, and Iceberg tables in LocalStack.

Inspect your bucket contents (optional):

awslocal s3 ls s3://<your-warehouse-bucket>/warehouse/ --recursive

or from your host:

aws s3 ls s3://<your-warehouse-bucket>/warehouse/ --recursive \
    --endpoint-url http://localhost:4566 --region eu-west-1

You now have a zero-cost, offline playground for developing and testing your Iceberg CDK stacks.

IMPORTANT:

The deployment can be shown as successful, but you can't access the table. This is because Localstack Pro is required to use AWS Glue. You can still use this setup with the standard Localstack to test whether your CDK stack can be deployed.

📖 Learn More

🧑‍💻 Author

Anatol Jurenkow

Cloud Data Engineer | AWS CDK Enthusiast | Iceberg Fan

(https://github.com/anatol-ju)[GitHub] · (https://de.linkedin.com/in/anatol-jurenkow)[LinkedIn]

📄 License

“This project is for portfolio purposes only. Please contact me if you’d like to reuse or adapt this code.”

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
bin		bin
lib		lib
src/schemas		src/schemas
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
.python-version		.python-version
.yarnrc.yml		.yarnrc.yml
Makefile		Makefile
README.md		README.md
cdk.Dockerfile		cdk.Dockerfile
cdk.json		cdk.json
docker-compose.yaml		docker-compose.yaml
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Iceberg Table Infrastructure with AWS CDK

🚀 Overview

🧱 Project Structure

🔧 Features

📦 Prerequisites

🚚 Deploying the Stack

🧪 Example Schema Mapping

🔍 Outputs

📋 Local Setup

📖 Learn More

🧑‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Languages

anatol-ju/iceberg-tables-example

Folders and files

Latest commit

History

Repository files navigation

Iceberg Table Infrastructure with AWS CDK

🚀 Overview

🧱 Project Structure

🔧 Features

📦 Prerequisites

🚚 Deploying the Stack

🧪 Example Schema Mapping

🔍 Outputs

📋 Local Setup

📖 Learn More

🧑‍💻 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages