diff --git a/README.md b/README.md index 5f4a18e..7d74205 100644 --- a/README.md +++ b/README.md @@ -29,11 +29,11 @@ The simulator supports two modes of operation: - `echo` mode: the response contains the same text that was received in the request. For `/v1/chat/completions` the last message for the role=`user` is used. - `random` mode: the response is randomly chosen from a set of pre-defined sentences. -Timing of the response is defined by two parameters: `time-to-first-token` and `inter-token-latency`. +Timing of the response is defined by the `time-to-first-token` and `inter-token-latency` parameters. In case P/D is enabled for a request, `kv-cache-transfer-latency` will be used instead of `time-to-first-token`. -For a request with `stream=true`: `time-to-first-token` defines the delay before the first token is returned, `inter-token-latency` defines the delay between subsequent tokens in the stream. +For a request with `stream=true`: `time-to-first-token` or `kv-cache-transfer-latency` defines the delay before the first token is returned, `inter-token-latency` defines the delay between subsequent tokens in the stream. -For a requst with `stream=false`: the response is returned after delay of ` + ( * ( - 1))` +For a requst with `stream=false`: the response is returned after delay of ` + ( * ( - 1))` or ` + ( * ( - 1))` in P/D case It can be run standalone or in a Pod for testing under packages such as Kind. @@ -99,6 +99,7 @@ For more details see the