Skip to content

Commit a428dbe

Browse files
hitsub2dongdgy
and
dongdgy
authored
feat: Add binpacking examples (#615)
Co-authored-by: dongdgy <dongdgy@amazon.com>
1 parent 7d2d34c commit a428dbe

File tree

5 files changed

+104
-0
lines changed

5 files changed

+104
-0
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
sidebar_position: 4
3+
sidebar_label: Bin packing for Amazon EKS
4+
---
5+
6+
7+
# Bin packing for Amazon EKS
8+
9+
## Introduction
10+
In this post, we will show you how to enable a custom scheduler with Amazon EKS when running DoEKS especially for Spark on EKS, including OSS Spark and EMR on EKS. The custom scheduler is a custom Kubernetes scheduler with ```MostAllocated``` strategy running in data plane.
11+
12+
### Why bin packing
13+
By default, the [scheduling-plugin](https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins) NodeResourcesFit use the ```LeastAllocated``` for score strategies. For the long running workloads, that is good because of high availability. But for batch jobs, like Spark workloads, this would lead high cost. By changing the from ```LeastAllocated``` to ```MostAllocated```, it avoids spreading pods across all running nodes, leading to higher resource utilization and better cost efficiency.
14+
15+
Batch jobs like Spark are running on demand with limited or predicted time. With ```MostAllocated``` strategy, Spark executors are always bin packing into one node util the node can not host any pods. You can see the following picture shows the
16+
17+
```MostAllocated``` in EMR on EKS.
18+
19+
![img.png](img/binpack_singlejob.gif)
20+
21+
22+
```LeastAllocated``` in EMR on EKS
23+
24+
![img.png](img/no_binpacking.gif)
25+
26+
### Pros
27+
1) Improve the node utilizations
28+
2) Save the cost
29+
30+
### Considerations
31+
Although we have provided upgrade guidance, support matrix and high availability design, but maintaining a custom scheduler in data plane needs effort including:
32+
1) Upgrade operations. Plan the upgrading along with your batch jobs, make sure the scheduler are running as desired.
33+
2) Monitoring the scheduler. Monitoring and alerting are required for production purpose.
34+
3) Adjust the scheduler pod resource and other customizations regarding your requirements.
35+
36+
## Deploying the Solution
37+
38+
### Clone the repo
39+
40+
```shell
41+
git clone https://github.com/aws-samples/custom-scheduler-eks
42+
cd custom-scheduler-eks
43+
```
44+
45+
### Manifests
46+
47+
**Amazon EKS 1.24**
48+
49+
```shell
50+
kubectl apply -f deploy/manifests/custom-scheduler/amazon-eks-1.24-custom-scheduler.yaml
51+
```
52+
53+
**Amazon EKS 1.29**
54+
55+
```shell
56+
kubectl apply -f deploy/manifests/custom-scheduler/amazon-eks-1.29-custom-scheduler.yaml
57+
```
58+
59+
**Other Amazon EKS versions**
60+
61+
* replace the related image URL(https://gallery.ecr.aws/eks-distro/kubernetes/kube-scheduler)
62+
63+
Please refer to [custom-scheduler](https://github.com/aws-samples/custom-scheduler-eks) for more info.
64+
65+
### Set up pod template to use the custom scheduler for Spark
66+
We should add custom scheduler name to the pod template as follows
67+
```bash
68+
kind: Pod
69+
spec:
70+
schedulerName: custom-k8s-scheduler
71+
volumes:
72+
- name: spark-local-dir-1
73+
hostPath:
74+
path: /local1
75+
initContainers:
76+
- name: volume-permission
77+
image: public.ecr.aws/docker/library/busybox
78+
# grant volume access to hadoop user
79+
command: ['sh', '-c', 'if [ ! -d /data1 ]; then mkdir /data1;fi; chown -R 999:1000 /data1']
80+
volumeMounts:
81+
- name: spark-local-dir-1
82+
mountPath: /data1
83+
containers:
84+
- name: spark-kubernetes-executor
85+
volumeMounts:
86+
- name: spark-local-dir-1
87+
mountPath: /data1
88+
```
89+
90+
91+
## Verification and Monitor via [eks-node-viewer](https://github.com/awslabs/eks-node-viewer)
92+
93+
Before apply the change in the pod template
94+
95+
![img.png](img/before-binpacking.png)
96+
97+
After the change: Higher CPU usage at pod schedule time
98+
![img.png](img/after-binpacking.png)
99+
100+
## Conclusion
101+
102+
By using the custom scheduler, we can fully improve the node utilizations for the Spark workloads which will save the cost by triggering node scale in.
103+
104+
For the users that running Spark on EKS, we recommend you adopt this custom scheduler before Amazon EKS officially support the [kube-scheduler customization](https://github.com/aws/containers-roadmap/issues/1468).
Loading
Loading
1.41 MB
Loading
1.22 MB
Loading

0 commit comments

Comments
 (0)