Skip to content

Commit c304570

Browse files
Adds EP-10411: Gateway API Inference Extension Support (#10420)
Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> Co-authored-by: Tim Flannagan <timflannagan@gmail.com>
1 parent 23d8430 commit c304570

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed

design/10411.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# EP-10411: Gateway API Inference Extension Support
2+
3+
* Issue: [#10411](https://github.com/kgateway-dev/kgateway/issues/10411)
4+
5+
## Background
6+
7+
This EP proposes adding [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main) (GIE) support. GIE is an open source project that originated from [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). It provides APIs, a scheduling algorithm, a reference extension implementation, and controllers to support advanced routing of GenAI/ML network traffic.
8+
9+
## Motivation
10+
11+
Provide a standards-based approach to load balancing inference workloads.
12+
13+
## Goals
14+
15+
The following list defines goals for this EP.
16+
17+
* Provide initial GIE support allowing for easy experimentation of advanced LLM traffic routing via the [Endpoint Selection Extension](https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension) (ESE), one of GIE's reference extension implementations.
18+
* Allow users to enable/disable this feature.
19+
* Implement GIE as a kgateway plugin.
20+
* Add [InferencePool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1alpha1/inferencepool_types.go) as a supported HTTPRoute backend reference.
21+
* Provide the ability to manage the GIE deployment.
22+
* Provide e2e testing of this feature.
23+
* Provide initial user documentation, e.g. quick start guide.
24+
25+
## Non-Goals
26+
27+
The following list defines non-goals for this EP.
28+
29+
* Run production traffic using this feature.
30+
* Provide kgateway-specific GIE extensions.
31+
* Support non-GIE traffic routing functionality that may be achieved through integration with kgateway-specific APIs.
32+
* Provide stats for the initial GIE implementation since it lacks a metrics endpoint.
33+
* Secure the gRPC connection between Gateway and GIE implementations.
34+
* Support kgateway upgrades when this feature is enabled.
35+
36+
## Implementation Details
37+
38+
The following sections describe implementation details for this EP.
39+
40+
### Configuration
41+
42+
Update the [configuration](https://github.com/kgateway-dev/kgateway/blob/main/install/helm/kgateway/values.yaml) API to enable/disable this feature.
43+
The feature will be disabled by default:
44+
45+
```yaml
46+
inferenceExtension:
47+
enabled: false
48+
```
49+
50+
For the initial implementation, no other configuration parameters will be exposed to users. Since Inference Extension is decoupled from Gateway API,
51+
GatewayParameters will not be used for runtime configuration.
52+
53+
__Note:__ A user must install the Inference Extension CRDs before installing kgateway with this feature enabled.
54+
55+
### Plugin
56+
57+
Add GIE ESE as a supported [plugin](https://github.com/kgateway-dev/kgateway/tree/main/internal/kgateway/extensions2/plugins). The plugin will:
58+
59+
* Manage [Endpoints](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/endpoint.html) based on the [InferencePool](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/) resource specification. The Gateway implementation, e.g. Envoy proxy, will forward matching requests using the [External Processing Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#external-processing-filter-proto) (ext proc) to the ESE deployment. The ESE is responsible for processing the request, selecting an Endpoint, and returning the selected Endpoint to Envoy for routing.
60+
* Manage route filters to redirect requests to the ESE ext proc server.
61+
* Manage the ext proc server cluster.
62+
63+
### Controllers
64+
65+
* Add a controller to reconcile InferencePool custom resources, manage status, trigger the ESE deployment, etc.
66+
* Controllers should run only if the feature is enabled and GIE CRDs exist.
67+
* Update RBAC rules to allow controllers to access GIE custom resources.
68+
69+
### Deployer
70+
71+
Since GIE extensions are decoupled from the Gateway implementation, a separate [deployer](https://github.com/kgateway-dev/kgateway/tree/main/internal/kgateway/deployer)
72+
will be created to manage the required ESE resources, e.g. Deployment, Service, etc.
73+
74+
### Translator and Proxy Syncer
75+
76+
* Add InferencePool as a supported HTTPRoute backend reference.
77+
* Update the [translator](https://github.com/kgateway-dev/kgateway/tree/main/internal/kgateway/translator) package to handle InferencePool references from the HTTPRoute type.
78+
* Enhance the [proxy_syncer](https://github.com/kgateway-dev/kgateway/tree/main/projects/gateway2/proxy_syncer) to translate the InferencePool custom resource into the IR and sync with the proxy client. When an HTTPRoute references an InferencePool, ensure the Envoy ext_proc filter is attached or the cluster references the ESE cluster.
79+
80+
### Reporting
81+
82+
* Update the [reporter](https://github.com/kgateway-dev/kgateway/tree/main/projects/gateway2/reports) package to support status reporting, e.g. `ResolvedRefs=true` when HTTPRoute references an InferencePool.
83+
84+
## Open Questions
85+
86+
1. ~~Is a new plugin type required or can an existing type be utilized, e.g. UpstreamPlugin?~~ A new plugin type will be created specific to GIE.

0 commit comments

Comments
 (0)