|
| 1 | +# EP-10411: Gateway API Inference Extension Support |
| 2 | + |
| 3 | +* Issue: [#10411](https://github.com/kgateway-dev/kgateway/issues/10411) |
| 4 | + |
| 5 | +## Background |
| 6 | + |
| 7 | +This EP proposes adding [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main) (GIE) support. GIE is an open source project that originated from [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). It provides APIs, a scheduling algorithm, a reference extension implementation, and controllers to support advanced routing of GenAI/ML network traffic. |
| 8 | + |
| 9 | +## Motivation |
| 10 | + |
| 11 | +Provide a standards-based approach to load balancing inference workloads. |
| 12 | + |
| 13 | +## Goals |
| 14 | + |
| 15 | +The following list defines goals for this EP. |
| 16 | + |
| 17 | +* Provide initial GIE support allowing for easy experimentation of advanced LLM traffic routing via the [Endpoint Selection Extension](https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension) (ESE), one of GIE's reference extension implementations. |
| 18 | +* Allow users to enable/disable this feature. |
| 19 | +* Implement GIE as a kgateway plugin. |
| 20 | +* Add [InferencePool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1alpha1/inferencepool_types.go) as a supported HTTPRoute backend reference. |
| 21 | +* Provide the ability to manage the GIE deployment. |
| 22 | +* Provide e2e testing of this feature. |
| 23 | +* Provide initial user documentation, e.g. quick start guide. |
| 24 | + |
| 25 | +## Non-Goals |
| 26 | + |
| 27 | +The following list defines non-goals for this EP. |
| 28 | + |
| 29 | +* Run production traffic using this feature. |
| 30 | +* Provide kgateway-specific GIE extensions. |
| 31 | +* Support non-GIE traffic routing functionality that may be achieved through integration with kgateway-specific APIs. |
| 32 | +* Provide stats for the initial GIE implementation since it lacks a metrics endpoint. |
| 33 | +* Secure the gRPC connection between Gateway and GIE implementations. |
| 34 | +* Support kgateway upgrades when this feature is enabled. |
| 35 | + |
| 36 | +## Implementation Details |
| 37 | + |
| 38 | +The following sections describe implementation details for this EP. |
| 39 | + |
| 40 | +### Configuration |
| 41 | + |
| 42 | +Update the [configuration](https://github.com/kgateway-dev/kgateway/blob/main/install/helm/kgateway/values.yaml) API to enable/disable this feature. |
| 43 | +The feature will be disabled by default: |
| 44 | + |
| 45 | +```yaml |
| 46 | +inferenceExtension: |
| 47 | + enabled: false |
| 48 | +``` |
| 49 | +
|
| 50 | +For the initial implementation, no other configuration parameters will be exposed to users. Since Inference Extension is decoupled from Gateway API, |
| 51 | +GatewayParameters will not be used for runtime configuration. |
| 52 | +
|
| 53 | +__Note:__ A user must install the Inference Extension CRDs before installing kgateway with this feature enabled. |
| 54 | +
|
| 55 | +### Plugin |
| 56 | +
|
| 57 | +Add GIE ESE as a supported [plugin](https://github.com/kgateway-dev/kgateway/tree/main/internal/kgateway/extensions2/plugins). The plugin will: |
| 58 | +
|
| 59 | +* Manage [Endpoints](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/endpoint.html) based on the [InferencePool](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/) resource specification. The Gateway implementation, e.g. Envoy proxy, will forward matching requests using the [External Processing Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#external-processing-filter-proto) (ext proc) to the ESE deployment. The ESE is responsible for processing the request, selecting an Endpoint, and returning the selected Endpoint to Envoy for routing. |
| 60 | +* Manage route filters to redirect requests to the ESE ext proc server. |
| 61 | +* Manage the ext proc server cluster. |
| 62 | +
|
| 63 | +### Controllers |
| 64 | +
|
| 65 | +* Add a controller to reconcile InferencePool custom resources, manage status, trigger the ESE deployment, etc. |
| 66 | +* Controllers should run only if the feature is enabled and GIE CRDs exist. |
| 67 | +* Update RBAC rules to allow controllers to access GIE custom resources. |
| 68 | +
|
| 69 | +### Deployer |
| 70 | +
|
| 71 | +Since GIE extensions are decoupled from the Gateway implementation, a separate [deployer](https://github.com/kgateway-dev/kgateway/tree/main/internal/kgateway/deployer) |
| 72 | +will be created to manage the required ESE resources, e.g. Deployment, Service, etc. |
| 73 | +
|
| 74 | +### Translator and Proxy Syncer |
| 75 | +
|
| 76 | +* Add InferencePool as a supported HTTPRoute backend reference. |
| 77 | +* Update the [translator](https://github.com/kgateway-dev/kgateway/tree/main/internal/kgateway/translator) package to handle InferencePool references from the HTTPRoute type. |
| 78 | +* Enhance the [proxy_syncer](https://github.com/kgateway-dev/kgateway/tree/main/projects/gateway2/proxy_syncer) to translate the InferencePool custom resource into the IR and sync with the proxy client. When an HTTPRoute references an InferencePool, ensure the Envoy ext_proc filter is attached or the cluster references the ESE cluster. |
| 79 | +
|
| 80 | +### Reporting |
| 81 | +
|
| 82 | +* Update the [reporter](https://github.com/kgateway-dev/kgateway/tree/main/projects/gateway2/reports) package to support status reporting, e.g. `ResolvedRefs=true` when HTTPRoute references an InferencePool. |
| 83 | + |
| 84 | +## Open Questions |
| 85 | + |
| 86 | +1. ~~Is a new plugin type required or can an existing type be utilized, e.g. UpstreamPlugin?~~ A new plugin type will be created specific to GIE. |
0 commit comments