Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 1401741

Browse files
togashidmkillianmuldoon
authored andcommitted
Update files after combining tas-extender and tas-controller
The following files are modified: - cmd/scheduler-extender/main.go modified to allow tasExtender and tasController functions to access the same cache. - deploy/tas-deployment.yaml modified to use only one container. - pkg/scheduler/scheduler.go fixed for multiple registrations with ServeMux - Makefile to attend the modifications from the merge of the two previous components into one. - README.md updated to reflect the changes The following files are removed - cmd/tas-policy-controller/main.go - deploy/images/Dockerfile_controller - pkg/cache/remote.go - pkg/cache/server.go
1 parent b5f8a71 commit 1401741

File tree

11 files changed

+48
-239
lines changed

11 files changed

+48
-239
lines changed

.github/workflows/static-analysis.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,7 @@ jobs:
2222
# Each dockerfile needs to be pointed to individually with this setup
2323
- uses: brpaz/hadolint-action@v1.2.1
2424
with:
25-
dockerfile: deploy/images/Dockerfile_controller
26-
- uses: brpaz/hadolint-action@v1.2.1
27-
with:
28-
dockerfile: deploy/images/Dockerfile_extender
25+
dockerfile: deploy/images/Dockerfile
2926
golangci:
3027
strategy:
3128
matrix:

Makefile

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
BINARY_NAME_1=controller
2-
BINARY_NAME_2=extender
1+
BINARY_NAME=extender
32

43
.PHONY: test
54

@@ -11,18 +10,13 @@ test:
1110
all: format build
1211

1312
build:
14-
CGO_ENABLED=0 GO111MODULE=on go build -ldflags="-s -w" -o ./bin/$(BINARY_NAME_1) ./cmd/tas-policy-controller
15-
CGO_ENABLED=0 GO111MODULE=on go build -ldflags="-s -w" -o ./bin/$(BINARY_NAME_2) ./cmd/tas-scheduler-extender
16-
13+
CGO_ENABLED=0 GO111MODULE=on go build -ldflags="-s -w" -o ./bin/$(BINARY_NAME) ./cmd
1714
image:
18-
docker build -f deploy/images/Dockerfile_extender bin/ -t tas-extender
19-
docker build -f deploy/images/Dockerfile_controller bin/ -t tas-controller
20-
15+
docker build -f deploy/images/Dockerfile bin/ -t tasextender
2116
format:
2217
gofmt -w -s .
2318

2419
clean:
25-
rm -f ./bin/$(BINARY_NAME_1)
26-
rm -f ./bin/$(BINARY_NAME_2)
20+
rm -f ./bin/$(BINARY_NAME)
2721

2822

README.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,17 @@ For example - a pod that requires certain cache characteristics can be schedule
77
**This software is a pre-production alpha version and should not be deployed to production servers.**
88

99

10-
## Components
11-
Telemetry Aware Scheduling is made up of two components deployed in a single pod on a Kubernetes Cluster.
10+
## Introduction
1211

13-
### Telemetry Aware Scheduler Extender
1412
Telemetry Aware Scheduler Extender is contacted by the generic Kubernetes Scheduler every time it needs to make a scheduling decision.
1513
The extender checks if there is a telemetry policy associated with the workload.
1614
If so, it inspects the strategies associated with the policy and returns opinions on pod placement to the generic scheduler.
1715
The scheduler extender has two strategies it acts on - scheduleonmetric and dontschedule.
1816
This is implemented and configured as a [Kubernetes Scheduler Extender.](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#cluster-level-extended-resources)
1917

20-
### Telemetry Policy Controller
21-
The Telemetry Policy Controller consumes TAS Policies - a Custom Resource. The controller parses this policy for deschedule, scheduleonmetric and dontschedule strategies and places them in a cache to make them locally available to all TAS components.
18+
The Scheduler consumes TAS Policies - a Custom Resource. The extender parses this policy for deschedule, scheduleonmetric and dontschedule strategies and places them in a cache to make them locally available to all TAS components.
2219
It consumes new Telemetry Policies as they are created, removes them when deleted, and updates them as they are changed.
23-
The policy controller also monitors the current state of policies to see if they are violated. For example if it notes that a deschedule policy is violated it labels the node as a violator allowing pods relating to that policy to be descheduled.
20+
The extender also monitors the current state of policies to see if they are violated. For example if it notes that a deschedule policy is violated it labels the node as a violator allowing pods relating to that policy to be descheduled.
2421

2522
## Usage
2623
A worked example for TAS is available [here](docs/health-metric-example.md)
@@ -33,7 +30,7 @@ There are three strategies that TAS acts on.
3330
**2 dontschedule** strategy has multiple rules, each with a metric name and operator and a target. A pod with this policy will never be scheduled on a node breaking any one of these rules.
3431
- example: **dontschedule** if **gpu_usage** is **GreaterThan 10**
3532

36-
**3 deschedule** is consumed by the Telemetry Policy Controller. If a pod with this policy is running on a node that violates it can be descheduled with the kubernetes descheduler.
33+
**3 deschedule** is consumed by the extender. If a pod with this policy is running on a node that violates it can be descheduled with the kubernetes descheduler.
3734
- example: **deschedule** if **network_bandwidth_percent_free** is **LessThan 10**
3835

3936
The policy definition section below describes how to actually create these strategies in a kubernetes cluster.
@@ -163,28 +160,25 @@ spec:
163160
There are three strategy types in a policy file and rules associated with each.
164161
- **scheduleonmetric** has only one rule. It is consumed by the Telemetry Aware Scheduling Extender and prioritizes nodes based on the rule.
165162
- **dontschedule** strategy has multiple rules, each with a metric name and operator and a target. A pod with this policy will never be scheduled on a node breaking any one of these rules.
166-
- **deschedule** is consumed by the Telemetry Policy Controller. If a pod with this policy is running on a node that violates that pod can be descheduled with the kubernetes descheduler.
163+
- **deschedule** is consumed by the extender. If a pod with this policy is running on a node that violates that pod can be descheduled with the kubernetes descheduler.
167164

168165
dontschedule and deschedule - which incorporate multiple rules - function with an OR operator. That is if any single rule is broken the strategy is considered violated.
169166
Telemetry policies are namespaced, meaning that under normal circumstances a workload can only be associated with a pod in the same namespaces.
170167

171168
### Configuration flags
172-
The below flags can be passed to the binaries at run time.
169+
The below flags can be passed to the binary at run time.
173170

174-
#### TAS Policy Controller
171+
#### TAS Scheduler Extender
175172
name |type | description| usage | default|
176173
-----|------|-----|-------|-----|
177174
|kubeConfig| string |location of kubernetes configuration file | -kubeConfig /root/filename|~/.kube/config
178175
|syncPeriod|duration string| interval between refresh of telemetry data|-syncPeriod 1m| 1s
179176
|cachePort | string | port number at which the cache server will listen for requests | --cachePort 9999 | 8111
180-
181-
#### TAS Scheduler Extender
182-
name |type | description| usage | default|
183-
-----|------|-----|-------|-----|
184177
|syncPeriod|duration string| interval between refresh of telemetry data|-syncPeriod 1m| 1s
185178
|port| int | port number on which the scheduler extender will listen| -port 32000 | 9001
186179
|cert| string | location of the cert file for the TLS endpoint | --cert=/root/cert.txt| /etc/kubernetes/pki/ca.crt
187180
|key| string | location of the key file for the TLS endpoint| --key=/root/key.txt | /etc/kubernetes/pki/ca.key
181+
|cacert| string | location of the ca certificate for the TLS endpoint| --key=/root/cacert.txt | /etc/kubernetes/pki/ca.crt
188182
|unsafe| bool | whether or not to listen on a TLS endpoint with the scheduler extender | --unsafe=true| false
189183

190184
## Linking a workload to a policy
@@ -235,10 +229,10 @@ There are three changes to the demo policy here:
235229
- Affinity rules which add a requiredDuringSchedulingIgnoredDuringExecution affinity to nodes which are labelled ``<POLICYNAME>=violating`` This is used by the descheduler to identify pods on nodes which break their TAS telemetry policies.
236230

237231
### Security
238-
TAS Policy Controller is set up to use in-Cluster config in order to access the Kubernetes API Server. When deployed inside the cluster this along with RBAC controls configured in the installation guide, will give it access to the required resources.
239-
If outside the cluster TAS Policy Controller will try to use a kubernetes config file in order to get permission to get resources from the API server. This can be passed with the --kubeconfig flag to the controller.
232+
TAS Scheduler Extender is set up to use in-Cluster config in order to access the Kubernetes API Server. When deployed inside the cluster this along with RBAC controls configured in the installation guide, will give it access to the required resources.
233+
If outside the cluster TAS will try to use a kubernetes config file in order to get permission to get resources from the API server. This can be passed with the --kubeconfig flag to the binary.
240234

241-
TAS Scheduler Extender contacts api server in the same way as policy controller. An identical flag --kubeConfig can be passed if it's operating outside the cluster.
235+
When TAS Scheduler Extender contacts api server an identical flag --kubeConfig can be passed if it's operating outside the cluster.
242236
Additionally TAS Scheduler Extender listens on a TLS endpoint which requires a cert and a key to be supplied.
243237
These are passed to the executable using command line flags. In the provided deployment these certs are added in a Kubernetes secret which is mounted in the pod and passed as flags to the executable from there.
244238

cmd/tas-policy-controller/main.go renamed to cmd/main.go

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
package main
22

33
import (
4-
tascache "github.com/intel/telemetry-aware-scheduling/pkg/cache"
4+
"flag"
55
"github.com/intel/telemetry-aware-scheduling/pkg/controller"
66
"github.com/intel/telemetry-aware-scheduling/pkg/metrics"
7+
"github.com/intel/telemetry-aware-scheduling/pkg/scheduler"
8+
"github.com/intel/telemetry-aware-scheduling/pkg/telemetryscheduler"
79
strategy "github.com/intel/telemetry-aware-scheduling/pkg/strategies/core"
810
"github.com/intel/telemetry-aware-scheduling/pkg/strategies/deschedule"
911
"github.com/intel/telemetry-aware-scheduling/pkg/strategies/dontschedule"
1012
"github.com/intel/telemetry-aware-scheduling/pkg/strategies/scheduleonmetric"
1113
telemetrypolicyclient "github.com/intel/telemetry-aware-scheduling/pkg/telemetrypolicy/client/v1alpha1"
12-
13-
"context"
14-
"flag"
1514
"k8s.io/client-go/kubernetes"
1615
"k8s.io/client-go/rest"
1716
"k8s.io/client-go/tools/clientcmd"
@@ -20,18 +19,32 @@ import (
2019
"os/signal"
2120
"syscall"
2221
"time"
22+
23+
"context"
24+
tascache "github.com/intel/telemetry-aware-scheduling/pkg/cache"
2325
)
2426

25-
func parseCLIFlags(kubeConfig *string, syncPeriod *string, cachePort *string) {
26-
flag.StringVar(kubeConfig, "kubeConfig", "/root/.kube/config", "location of kubernetes config file")
27-
flag.StringVar(syncPeriod, "syncPeriod", "5s", "length of time in seconds between metrics updates")
28-
flag.StringVar(cachePort, "cachePort", "8111", "enpoint at which cache server should be as accessible")
27+
func main() {
28+
var kubeConfig, port, certFile, keyFile, caFile, syncPeriod string
29+
var unsafe bool
30+
flag.StringVar(&kubeConfig, "kubeConfig", "/root/.kube/config", "location of kubernetes config file")
31+
flag.StringVar(&port, "port", "9001", "port on which the scheduler extender will listen")
32+
flag.StringVar(&certFile, "cert", "/etc/kubernetes/pki/ca.crt", "cert file extender will use for authentication")
33+
flag.StringVar(&keyFile, "key", "/etc/kubernetes/pki/ca.key", "key file extender will use for authentication")
34+
flag.StringVar(&caFile, "cacert", "/etc/kubernetes/pki/ca.crt", "ca file extender will use for authentication")
35+
flag.BoolVar(&unsafe, "unsafe", false, "unsafe instances of telemetry aware scheduler will be served over simple http.")
36+
flag.StringVar(&syncPeriod, "syncPeriod", "5s", "length of time in seconds between metrics updates")
2937
flag.Parse()
38+
cache := tascache.NewAutoUpdatingCache()
39+
tscheduler := telemetryscheduler.NewMetricsExtender(cache)
40+
sch := scheduler.Server{ExtenderScheduler: tscheduler}
41+
go sch.StartServer(port, certFile, keyFile, caFile, unsafe)
42+
tasController(kubeConfig, syncPeriod, cache)
3043
}
3144

32-
func main() {
33-
var kubeConfig, syncPeriod, cachePort string
34-
parseCLIFlags(&kubeConfig, &syncPeriod, &cachePort)
45+
//tasController The controller load the TAS policy/strategies and places them into a local cache that is available
46+
//to all TAS components. It also monitors the current state of policies.
47+
func tasController(kubeConfig string, syncPeriod string, cache *tascache.AutoUpdatingCache) {
3548
kubeClient, clientConfig, err := getkubeClient(kubeConfig)
3649
if err != nil {
3750
panic(err)
@@ -46,11 +59,8 @@ func main() {
4659
panic(err)
4760
}
4861
metricTicker := time.NewTicker(syncDuration)
49-
cache := tascache.NewAutoUpdatingCache()
5062
initialData := map[string]interface{}{}
5163
go cache.PeriodicUpdate(*metricTicker, metricsClient, initialData)
52-
go cache.Serve(cachePort)
53-
5464
enforcerTicker := time.NewTicker(syncDuration)
5565
ctx, cancelFunc := context.WithCancel(context.Background())
5666
defer cancelFunc()

cmd/tas-scheduler-extender/main.go

Lines changed: 0 additions & 26 deletions
This file was deleted.
File renamed without changes.

deploy/images/Dockerfile_controller

Lines changed: 0 additions & 8 deletions
This file was deleted.

deploy/tas-deployment.yaml

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,14 @@ spec:
1717
spec:
1818
serviceAccountName: telemetry-aware-scheduling-service-account
1919
containers:
20-
- name: tascont
21-
command:
22-
- /controller
23-
- --syncPeriod=2s
24-
image: tas-controller
25-
imagePullPolicy: IfNotPresent
2620
- name: tasext
2721
command:
2822
- /extender
23+
- --syncPeriod=2s
2924
- --cert=/tas/cert/tls.crt
3025
- --key=/tas/cert/tls.key
3126
- --cacert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
32-
image: tas-extender
27+
image: tasextender
3328
imagePullPolicy: IfNotPresent
3429
securityContext:
3530
readOnlyRootFilesystem: true

pkg/cache/remote.go

Lines changed: 0 additions & 49 deletions
This file was deleted.

0 commit comments

Comments
 (0)