Skip to content

Commit c70cf1b

Browse files
Add model server (#89)
* add model-server Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * add ci Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * split model server assertion into separate step Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * use kubectl rollout status instead of kubectl wait Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * remove stale todos Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * move model-server related configs into sub-folder Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * use test -S to test for socket Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * fix typo sidcar -> sidecar Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * simplify doc Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * add readinessProbe to model-server Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * add startupProbe to model-server Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> * bump chart version Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> --------- Signed-off-by: Maximilian Bischoff <maximilian.bischoff@inovex.de> Co-authored-by: yellowhat <1692490+yellowhat@users.noreply.github.com>
1 parent 2ba29a9 commit c70cf1b

File tree

9 files changed

+266
-6
lines changed

9 files changed

+266
-6
lines changed

.github/workflows/integration_test.yml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@ on:
77

88
jobs:
99
integration_test:
10+
strategy:
11+
matrix:
12+
testConfig:
13+
- expectModelServer: false
14+
- expectModelServer: true
15+
extraHelmFlags: '-f ci/model-server-enabled-values.yaml'
1016
runs-on: ubuntu-latest
1117
defaults:
1218
run:
@@ -30,11 +36,18 @@ jobs:
3036
- name: deploy kepler using helm chart
3137
run: |
3238
tree -a
33-
helm install kepler . --values values.yaml --create-namespace --namespace kepler --dry-run --debug
34-
helm install kepler . --values values.yaml --create-namespace --namespace kepler --debug
39+
helm install kepler . --values values.yaml --create-namespace --namespace kepler --dry-run --debug ${{ matrix.testConfig.extraHelmFlags }}
40+
helm install kepler . --values values.yaml --create-namespace --namespace kepler --debug ${{ matrix.testConfig.extraHelmFlags }}
3541
3642
- name: test if kepler is alive
3743
run: |
38-
sleep 60
39-
kubectl logs $(kubectl -n kepler get pods -oname) -n kepler
44+
echo "Waiting for kepler pods to become ready"
45+
kubectl rollout status daemonset,deployment --namespace kepler --timeout 120s
46+
kubectl logs $(kubectl -n kepler get pods -l app.kubernetes.io/component=exporter -oname) -n kepler
4047
kubectl get all -n kepler
48+
49+
- name: test model server
50+
if: matrix.testConfig.expectModelServer
51+
run: |
52+
# if the model-server configuration is correct the kepler pods should use the model served through the Estimator Sidecar
53+
kubectl logs $(kubectl -n kepler get pods -l app.kubernetes.io/component=exporter -oname) -n kepler | grep 'Using the EstimatorSidecar/AbsPower Power Model to estimate Node Component Power'

chart/kepler/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,5 @@ annotations:
2222
url: https://keybase.io/bradmccoydev/pgp_keys.asc
2323
2424
type: application
25-
version: 0.5.19
25+
version: 0.6.0
2626
appVersion: release-0.8.0

chart/kepler/README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,4 +75,26 @@ Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe ener
7575
| `redfish.enabled` | whether the redfisch secret is enabled | `false` |
7676
| `redfish.annotations` | annotations for redfish secret | `{}` |
7777
| `redfish.fileContent` | redfish credentials | `` |
78-
| `redfish.labels` | labels for redfish secret | `{}` |
78+
| `redfish.labels` | labels for redfish secret | `{}` |
79+
80+
## Model Server & Estimator Sidecar
81+
82+
| Name | Description | Value |
83+
| --------------------------------- | ----------------------------------------------------------------------------------------- | --------- |
84+
| `modelServer.enabled` | whether model-server and estimator sidecar should be deployed | `false` |
85+
| `modelServer.modelConfig` | [modelConfig](https://sustainable-computing.io/kepler_model_server/get_started/) contents | `NODE_COMPONENTS_ESTIMATOR=true` |
86+
| `modelServer.nameOverride` | overrides the name-suffix of the model-server deployment and service | `""` |
87+
| `modelServer.fullnameOverride` | replaces the name of the model-server deployment and service | `""` |
88+
| `modelServer.replicas` | replicas of the model-server deployment | `""` |
89+
| `modelServer.image.repository` | repository to pull the model-server image from | `"quay.io/sustainable_computing_io/kepler_model_server"` |
90+
| `modelServer.image.tag` | image tag for the model-server | `"v0.7.12"` |
91+
| `modelServer.image.pullPolicy` | image pull policy for the model-server image | `Always` |
92+
| `modelServer.imagePullSecrets` | Secret name for pulling model-server images from private repository | `[]` |
93+
| `modelServer.podAnnotations` | Additional pod annotations for the model-server pods | `{}` |
94+
| `modelServer.securityContext` | privileges and access control settings for the model-server container | `{}` |
95+
| `modelServer.podSecurityContext` | privileges and access control settings for model-server pods | `{}` |
96+
| `modelServer.resources` | resource limits and requests for the model-server | `{}` |
97+
| `modelServer.sidecarResources` | resource limits and requests for the estimator sidecar | `{}` |
98+
| `modelServer.service.annotations` | annotations for the model-server service | `{}` |
99+
| `modelServer.service.type` | the model-server service type | `ClusterIP` |
100+
| `modelServer.service.port` | the model-server service port | `8100` |
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
modelServer:
2+
enabled: true

chart/kepler/templates/daemonset.yaml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,31 @@ spec:
3232
imagePullSecrets:
3333
{{- toYaml . | nindent 8 }}
3434
{{- end }}
35+
{{- if .Values.modelServer.enabled }}
36+
initContainers:
37+
- name: estimator
38+
command:
39+
- python3
40+
args:
41+
- -u
42+
- src/kepler_model/estimate/estimator.py
43+
image: "{{ .Values.modelServer.image.repository }}:{{ .Values.modelServer.image.tag }}"
44+
imagePullPolicy: {{ .Values.modelServer.image.pullPolicy }}
45+
{{- with .Values.modelServer.sidecarResources }}
46+
resources:
47+
{{- toYaml . | nindent 12 }}
48+
{{- end }}
49+
restartPolicy: Always
50+
startupProbe:
51+
exec:
52+
command:
53+
- test
54+
- -S
55+
- /tmp/estimator.sock
56+
volumeMounts:
57+
- mountPath: /tmp
58+
name: estimator-sock
59+
{{- end }}
3560
containers:
3661
- name: kepler-exporter
3762
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
@@ -56,6 +81,17 @@ spec:
5681
value: "/metrics"
5782
- name: BIND_ADDRESS
5883
value: "0.0.0.0:{{ .Values.service.port }}"
84+
{{- if .Values.modelServer.enabled }}
85+
- name: MODEL_SERVER_ENABLE
86+
value: "true"
87+
- name: MODEL_SERVER_ENDPOINT
88+
value: {{ printf "http://%s:%d/model" (include "modelServer.fullname" .) .Values.modelServer.service.port }}
89+
{{- with .Values.modelServer.modelConfig }}
90+
- name: MODEL_CONFIG
91+
value: |
92+
{{- . | nindent 14 }}
93+
{{- end }}
94+
{{- end }}
5995
{{- range $key, $value := .Values.extraEnvVars }}
6096
- name: {{ $key | quote }}
6197
value: {{ $value | quote }}
@@ -104,6 +140,10 @@ spec:
104140
mountPath: /etc/redfish
105141
readOnly: true
106142
{{- end }}
143+
{{- if .Values.modelServer.enabled }}
144+
- name: estimator-sock
145+
mountPath: /tmp
146+
{{- end }}
107147
{{- with .Values.resources }}
108148
resources:
109149
{{- toYaml . | nindent 12 }}
@@ -135,6 +175,10 @@ spec:
135175
secret:
136176
secretName: {{ .Values.redfish.name }}
137177
{{- end }}
178+
{{- if .Values.modelServer.enabled }}
179+
- name: estimator-sock
180+
emptyDir: {}
181+
{{- end }}
138182
{{- with .Values.podSecurityContext }}
139183
securityContext:
140184
{{- toYaml . | nindent 8 }}
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
{{- define "modelServer.name" -}}
2+
{{- default "model-server" .Values.modelServer.nameOverride | trunc 63 | trimSuffix "-" }}
3+
{{- end }}
4+
5+
{{/*
6+
Create a default fully qualified app name.
7+
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
8+
If release name contains chart name it will be used as a full name.
9+
*/}}
10+
{{- define "modelServer.fullname" -}}
11+
{{- if .Values.fullnameOverride }}
12+
{{- .Values.modelServer.fullnameOverride | trunc 63 | trimSuffix "-" }}
13+
{{- else }}
14+
{{- $name := default "model-server" .Values.modelServer.nameOverride }}
15+
{{- if contains $name .Release.Name }}
16+
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
17+
{{- else }}
18+
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
19+
{{- end }}
20+
{{- end }}
21+
{{- end }}
22+
23+
{{/*
24+
Common labels
25+
*/}}
26+
{{- define "modelServer.labels" -}}
27+
helm.sh/chart: {{ include "kepler.chart" . }}
28+
{{ include "modelServer.selectorLabels" . }}
29+
{{- if .Chart.AppVersion }}
30+
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
31+
{{- end }}
32+
app.kubernetes.io/managed-by: {{ .Release.Service }}
33+
{{- end }}
34+
35+
{{/*
36+
Selector labels
37+
*/}}
38+
{{- define "modelServer.selectorLabels" -}}
39+
app.kubernetes.io/name: {{ include "kepler.name" . }}
40+
app.kubernetes.io/component: model-server
41+
{{- end }}
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
{{- if .Values.modelServer.enabled }}
2+
apiVersion: apps/v1
3+
kind: Deployment
4+
metadata:
5+
name: {{ include "modelServer.fullname" . }}
6+
labels:
7+
{{- include "modelServer.labels" . | nindent 4 }}
8+
{{- with .Values.modelServer.annotations }}
9+
annotations:
10+
{{- toYaml . | nindent 4 }}
11+
{{- end }}
12+
spec:
13+
replicas: {{ .Values.modelServer.replicas }}
14+
selector:
15+
matchLabels:
16+
{{- include "modelServer.selectorLabels" . | nindent 6 }}
17+
template:
18+
metadata:
19+
{{- with .Values.modelServer.podAnnotations }}
20+
annotations:
21+
{{- toYaml . | nindent 8 }}
22+
{{- end }}
23+
labels:
24+
{{- include "modelServer.selectorLabels" . | nindent 8 }}
25+
{{- with .Values.modelServer.podLabels }}
26+
{{- . | toYaml | nindent 8 }}
27+
{{- end }}
28+
spec:
29+
{{- with .Values.imagePullSecrets }}
30+
imagePullSecrets:
31+
{{- toYaml . | nindent 8 }}
32+
{{- end }}
33+
containers:
34+
- name: server-api
35+
args:
36+
- model-server
37+
image: "{{ .Values.modelServer.image.repository }}:{{ .Values.modelServer.image.tag }}"
38+
imagePullPolicy: {{ .Values.modelServer.image.pullPolicy }}
39+
ports:
40+
- containerPort: 8100
41+
name: http
42+
protocol: TCP
43+
volumeMounts:
44+
- mountPath: /mnt
45+
name: mnt
46+
{{- with .Values.modelServer.resources }}
47+
resources:
48+
{{- toYaml . | nindent 12 }}
49+
{{- end }}
50+
{{- with .Values.modelServer.securityContext }}
51+
securityContext:
52+
{{- toYaml . | nindent 12 }}
53+
{{- end }}
54+
startupProbe:
55+
httpGet:
56+
path: /best-models
57+
port: http
58+
initialDelaySeconds: 1
59+
readinessProbe:
60+
httpGet:
61+
path: /best-models
62+
port: http
63+
volumes:
64+
- name: mnt
65+
emptyDir: {}
66+
{{- with .Values.modelServer.podSecurityContext }}
67+
securityContext:
68+
{{- toYaml . | nindent 8 }}
69+
{{- end }}
70+
{{- with .Values.modelServer.nodeSelector }}
71+
nodeSelector:
72+
{{- toYaml . | nindent 8 }}
73+
{{- end }}
74+
{{- with .Values.modelServer.affinity }}
75+
affinity:
76+
{{- toYaml . | nindent 8 }}
77+
{{- end }}
78+
{{- with .Values.modelServer.tolerations }}
79+
tolerations:
80+
{{- toYaml . | nindent 8 }}
81+
{{- end }}
82+
{{- end }}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{{- if .Values.modelServer.enabled }}
2+
apiVersion: v1
3+
kind: Service
4+
metadata:
5+
name: {{ include "modelServer.fullname" . }}
6+
labels:
7+
{{- include "modelServer.labels" . | nindent 4 }}
8+
{{- with .Values.modelServer.service.annotations }}
9+
annotations:
10+
{{- toYaml . | nindent 4 }}
11+
{{- end }}
12+
spec:
13+
type: {{ .Values.modelServer.service.type }}
14+
ports:
15+
- name: http
16+
port: {{ .Values.modelServer.service.port }}
17+
targetPort: http
18+
protocol: TCP
19+
selector:
20+
{{- include "modelServer.selectorLabels" . | nindent 4 }}
21+
{{- end }}

chart/kepler/values.yaml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,3 +138,38 @@ networkPolicy:
138138
- ports:
139139
- protocol: TCP
140140
port: 9102
141+
142+
# Configure kepler [model-server](https://sustainable-computing.io/kepler_model_server/get_started/)
143+
modelServer:
144+
# whether model-server and estimator sidecar should be deployed
145+
enabled: false
146+
modelConfig: |
147+
NODE_COMPONENTS_ESTIMATOR=true
148+
nameOverride: ""
149+
fullNameOverride: ""
150+
image:
151+
repository: "quay.io/sustainable_computing_io/kepler_model_server"
152+
tag: "v0.7.12"
153+
pullPolicy: Always
154+
# replicas of the model-server Deployment
155+
replicas: 1
156+
# additional annotions for the model server Deployment
157+
annotations: {}
158+
# additional annotions for the model server Pods
159+
podAnnotations: {}
160+
# additional labels for the model server Pods
161+
podLabels: {}
162+
podSecurityContext: {}
163+
# security context for the model-server container in the model-server Deployment
164+
securityContext: {}
165+
nodeSelector:
166+
kubernetes.io/os: linux
167+
affinity: {}
168+
# resources for the model-server containers in the model-server Deployment
169+
resources: {}
170+
service:
171+
annotations: {}
172+
type: ClusterIP
173+
port: 8100
174+
# resources for the estimator sidecar deployed in the kepler DaemonSet
175+
sidecarResources: {}

0 commit comments

Comments
 (0)