Skip to content

Commit 33119d2

Browse files
authored
feat: EMR on EKS with Karpenter Managed Endpoint (#261)
* adding managed endpoint script * added doc * removed output json * adding optional assignment for cloudwatch logs and streams prefix
1 parent c8e6b5b commit 33119d2

File tree

3 files changed

+145
-0
lines changed

3 files changed

+145
-0
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
#!/bin/bash
2+
3+
read -p "Enter EMR Virtual Cluster Id: " EMR_VIRTUAL_CLUSTER_ID
4+
read -p "Provide your EMR on EKS team (emr-data-team-a or emr-data-team-b): " EMR_EKS_TEAM
5+
read -p "Enter your AWS Region: " AWS_REGION
6+
read -p "Enter a name for your endpoint: " EMR_EKS_MANAGED_ENDPOINT
7+
read -p "Provide an S3 bucket location for logging (i.e. s3://my-bucket/logging/): " S3_BUCKET
8+
read -p "Provide CloudWatch Logs Group Name: " CLOUDWATCH_LOGS_GROUP_NAME
9+
read -p "Provide CloudWatch Logs Prefix: " CLOUDWATCH_LOGS_PREFIX
10+
read -p "Enter the EMR Execution Role ARN (i.e. arn:aws:00000000000000000:role/EMR-Execution-Role): " EMR_EXECUTION_ROLE_ARN
11+
read -p "Enter the release label (i.e. emr-6.9.0-latest): " EMR_EKS_RELEASE_LABEL
12+
13+
#-------------------------------------------------------
14+
# Set Managed Endpoint JSON file with provided variables
15+
#-------------------------------------------------------
16+
export EMR_VIRTUAL_CLUSTER_ID=$EMR_VIRTUAL_CLUSTER_ID
17+
export EMR_EKS_TEAM=$EMR_EKS_TEAM
18+
export AWS_REGION=$AWS_REGION
19+
export EMR_EKS_MANAGED_ENDPOINT=$EMR_EKS_MANAGED_ENDPOINT
20+
export S3_BUCKET=$S3_BUCKET
21+
export EMR_EXECUTION_ROLE_ARN=$EMR_EXECUTION_ROLE_ARN
22+
export EMR_EKS_RELEASE_LABEL=$EMR_EKS_RELEASE_LABEL
23+
export CLOUDWATCH_LOGS_GROUP_NAME=$CLOUDWATCH_LOGS_GROUP_NAME
24+
export CLOUDWATCH_LOGS_PREFIX=$CLOUDWATCH_LOGS_PREFIX
25+
26+
envsubst < managed-endpoint.json > managed-endpoint-final.json
27+
28+
#------------------------------------------------------------------------------
29+
# Create managed endpoint and assign Load Balancer information as env variables
30+
#------------------------------------------------------------------------------
31+
32+
# Creating managed endpoint and saving the endpoint ID and Tag needed to identify the right Load Balancer
33+
MANAGED_ENDPOINT_ID=$(aws emr-containers create-managed-endpoint --cli-input-json file://./managed-endpoint-final.json --query id --output text)
34+
echo -e "The Managed Endpoint ID is $MANAGED_ENDPOINT_ID. \n"
35+
36+
TAG_VALUE=${EMR_EKS_TEAM}/ingress-${MANAGED_ENDPOINT_ID}
37+
38+
echo -e "Waiting for the managed endpoint load balancer to be active...\n"
39+
40+
sleep 30
41+
42+
# Saving Load Balancer ARN for the endpoint ingress
43+
for i in $(aws elbv2 describe-load-balancers | jq -r '.LoadBalancers[].LoadBalancerArn'); do aws elbv2 describe-tags --resource-arns "$i" | jq -ce --arg TAG_VALUE $TAG_VALUE '.TagDescriptions[].Tags[] | select( .Key == "ingress.k8s.aws/stack" and .Value == $TAG_VALUE)' && ARN=$i ;done
44+
echo -e "The Load Balancer ARN is $ARN. \n"
45+
46+
aws elbv2 wait load-balancer-available --load-balancer-arns $ARN
47+
48+
echo -e "The load balancer is in service. \n"
49+
50+
#------------------------------------------------------------------------
51+
# Revise to add the Karpenter Security Group to the Load Balancer created
52+
#------------------------------------------------------------------------
53+
54+
echo "Setting Security Groups for Jupyter Notebook and Karpenter..."
55+
# Security Group for Endpoint with the Jupyter Notebook
56+
NOTEBOOK_SG=$(aws ec2 describe-security-groups \
57+
--filters Name=group-name,Values="emr-containers-lb-$EMR_VIRTUAL_CLUSTER_ID-$MANAGED_ENDPOINT_ID" \
58+
--query "SecurityGroups[*].{ID:GroupId}" --output text)
59+
60+
# Karpenter Security Group
61+
KARPENTER_SG=$(aws ec2 describe-security-groups \
62+
--filters Name=group-name,Values="emr-eks-karpenter-node-*" \
63+
--query "SecurityGroups[*].{ID:GroupId}" --output text)
64+
65+
echo "Adding the security groups to the load balancer..."
66+
# Add these two Security Groups to the Load Balancer
67+
RESULT=$(aws elbv2 set-security-groups --load-balancer-arn $ARN --security-groups $NOTEBOOK_SG $KARPENTER_SG)
68+
69+
# Wait for 30 seconds before the managed endpoint is ready.
70+
sleep 30
71+
echo "The managed endpoint has been created."
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
{
2+
"name": "$EMR_EKS_MANAGED_ENDPOINT",
3+
"virtualClusterId": "$EMR_VIRTUAL_CLUSTER_ID",
4+
"type": "JUPYTER_ENTERPRISE_GATEWAY",
5+
"releaseLabel": "$EMR_EKS_RELEASE_LABEL",
6+
"executionRoleArn": "$EMR_EXECUTION_ROLE_ARN",
7+
"configurationOverrides":
8+
{
9+
"applicationConfiguration":
10+
[
11+
{
12+
"classification": "spark-defaults",
13+
"properties":
14+
{
15+
"spark.driver.memory": "8G"
16+
}
17+
}
18+
],
19+
"monitoringConfiguration":
20+
{
21+
"persistentAppUI": "ENABLED",
22+
"cloudWatchMonitoringConfiguration":
23+
{
24+
"logGroupName": "$CLOUDWATCH_LOGS_GROUP_NAME",
25+
"logStreamNamePrefix": "$CLOUDWATCH_LOGS_PREFIX"
26+
},
27+
"s3MonitoringConfiguration":
28+
{
29+
"logUri": "$S3_BUCKET"
30+
}
31+
}
32+
}
33+
}

website/docs/blueprints/amazon-emr-on-eks/emr-eks-karpenter.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,47 @@ cd analytics/terraform/emr-eks-karpenter/examples/nvme-ssd/deltalake
590590

591591
</CollapsibleContent>
592592

593+
## Run Interactive Workload with Managed Endpoint
594+
595+
Managed endpoint is a gateway that provides connectivity from EMR Studio to EMR on EKS so that you can run interactive workloads. You can find out more information about it [here](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/how-it-works.html).
596+
597+
### Creating a managed endpoint
598+
599+
In this example, we will create a managed endpoint under one of the data teams.
600+
601+
```bash
602+
Navigate to folder and execute script:
603+
604+
cd analytics/terraform/emr-eks-karpenter/examples/managed-endpoints
605+
./create-managed-endpoint.sh
606+
```
607+
```
608+
Enter the EMR Virtual Cluster Id: 4ucrncg6z4nd19vh1lidna2b3
609+
Provide your EMR on EKS team (emr-data-team-a or emr-data-team-b): emr-eks-data-team-a
610+
Enter your AWS Region: us-west-2
611+
Enter a name for your endpoint: emr-eks-team-a-endpoint
612+
Provide an S3 bucket location for logging (i.e. s3://my-bucket/logging/): s3://<bucket-name>/logs
613+
Enter the EMR Execution Role ARN (i.e. arn:aws:00000000000000000:role/EMR-Execution-Role): arn:aws:iam::181460066119:role/emr-eks-karpenter-emr-data-team-a
614+
```
615+
616+
The script will provide the following:
617+
- JSON configuration file for the Managed Endpoint
618+
- Configuration settings:
619+
- Default 8G Spark Driver
620+
- CloudWatch monitoring, with logs stored in the S3 bucket provided
621+
- Proper endpoint creation with appropriate security group to allow using Karpenter
622+
- Outputs: Managed Endpoint ID and Load Balancer ARN.
623+
624+
Once you have created a managed endpoint, you can follow the instructions [here](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-configure.html) to configure EMR Studio and associate the Managed endpoint to a workspace.
625+
626+
### Cleanup of Endpoint resources
627+
628+
To delete the managed endpoint, simply run the following command:
629+
630+
```bash
631+
aws emr-containers delete-managed-endpoint --id <Managed Endpoint ID> --virtual-cluster-id <Virtual Cluster ID>
632+
```
633+
593634
## Cleanup
594635
<CollapsibleContent header={<h2><span>Cleanup</span></h2>}>
595636
This script will cleanup the environment using `-target` option to ensure all the resources are deleted in correct order.

0 commit comments

Comments
 (0)