Minimal Terraform project to provision AWS and local resources for running machine learning workloads using SageMaker AI.
-
VPC and networking (default: 2 AZs, cost‑optimized)
- VPC with DNS support (
10.0.0.0/16
by default) - Public and private subnets across N AZs (default 2)
- Internet Gateway, route tables, and associations
- NAT Gateway(s) with Elastic IPs (default: single NAT for cost savings)
- VPC with DNS support (
-
S3 data bucket (secure by default)
- Versioning enabled, SSE-S3 (AES256), public access blocked
- Optional
force_destroy
for teardown safety (default:false
) - Creates a sample project prefix:
<initial_project_name>/
- Subfolders:
training/
,validation/
,test/
,model/
,checkpoints/
, andoutput/
- Subfolders:
-
SageMaker IAM
- Execution role for SageMaker with managed policies:
AmazonSageMakerFullAccess
,AmazonS3FullAccess
- Optional prediction-only IAM user (default: created) with:
- Custom InvokeEndpoint read policy and
AmazonS3ReadOnlyAccess
- Custom InvokeEndpoint read policy and
- Execution role for SageMaker with managed policies:
-
SageMaker Studio Domain (default: created)
- Domain with default user profile
- Default Studio app (
JupyterServer
) for the user profile
-
SageMaker JupyterLab Space (default: created when Domain is enabled)
- JupyterLab Space with default instance type and optional code repo
-
Optional SageMaker Notebook Instance (default: off)
- Instance type and volume size configurable
- Can attach to the created VPC/subnets and security groups
-
Local helper file
terraform_local_info.json
with: region, bucket name, SageMaker role ARN, notebook name (if any), VPC ID, private subnets, Domain ID/user, Space name
Note on costs: NAT Gateway, Studio Domain/Apps/Spaces, and Notebook Instances may incur charges. Use terraform destroy
when done and consider leaving force_destroy = false
unless you understand the implications.
- Terraform >= 1.5
- AWS account and credentials
- Either set
aws_profile
interraform.tfvars
to a profile in~/.aws/credentials
- Or export
AWS_PROFILE
/AWS_ACCESS_KEY_ID
/AWS_SECRET_ACCESS_KEY
environment variables
- Either set
- Optional: AWS CLI installed and configured (helps verify auth and browse outputs)
- Initialize Terraform
cd terraform
terraform init
- Create your
terraform.tfvars
(optional but recommended)
Windows PowerShell:
Copy-Item terraform.tfvars.example terraform.tfvars
macOS/Linux (bash):
cp terraform.tfvars.example terraform.tfvars
Then edit terraform.tfvars
to adjust variables (see Variables below).
- Choose how to authenticate
- Option A: In
terraform.tfvars
, setaws_profile = "<your_profile>"
- Option B: Export environment variable before running commands:
export AWS_PROFILE=<your_profile>
- Plan and apply
terraform plan
terraform apply
After apply, a local file terraform_local_info.json
is generated with region, bucket, role ARN, VPC details, and (if enabled) Domain/User/Space/Notebook for quick access from scripts or notebooks.
See terraform/variables.tf
and terraform.tfvars.example
for all options. Key toggles and defaults:
create_sagemaker_domain
(bool, defaulttrue
): Create SageMaker Studio Domain and default JupyterServer appcreate_jupyterlab_space
(bool, defaulttrue
): Create a JupyterLab Space within the Domaincreate_notebook
(bool, defaultfalse
): Create a classic SageMaker Notebook Instancecreate_prediction_user
(bool, defaulttrue
): Create an IAM user for prediction-only accessvpc_single_nat_gateway
(bool, defaulttrue
): Use a single NAT Gateway to reduce costbucket_name
(string, defaultnull
): If null, a unique name is generatedforce_destroy
(bool, defaultfalse
): Allow deleting non-empty S3 bucketinitial_project_name
(string, default"sample-project"
): Root prefix created with sample subfoldersspace_code_repository_url
(string): Default repo attached to the JupyterLab Space
Networking sizes, instances, and names are configurable; see the file for details.
Major outputs after apply (see terraform/outputs.tf
):
s3_bucket_name
,s3_bucket_arn
sagemaker_role_arn
notebook_name
(if created)prediction_policy_arn
,prediction_user_name
,prediction_user_arn
(if user created)vpc_id
,private_subnet_ids
sagemaker_domain_id
,sagemaker_domain_user_profile
(if Domain created)sagemaker_space_name
(if Space created)
To remove resources:
terraform destroy
If force_destroy = false
, empty the S3 bucket manually before destroy or set it to true
if you understand the implications.
MIT – see LICENSE
.