Setting TPM and RPM in proxy seems not working #8016

rivamarco · 2025-01-27T12:52:45Z

rivamarco
Jan 27, 2025

Hi, I'm playing around with LiteLLM Proxy both for cloud (OpenAI) and served models (vLLM) and it's great.

I was trying to implement the rate and token limits but I don't understand if what I want to do is achievable, because it seems not working, but probably I'm doing something wrong.

What I would like to do is limit the invocations of the model, so for example if I have a proxy with one OpenAI model i would like to set a maximum of tokens (e.g. 1000) so that after that limit is reached, the proxy returns an error. The same for the RPM.

The problem is that given the following:

model_list:
  - model_name: openai/gpt-4o
    litellm_params:
      api_key: os.environ/OPENAI_API_KEY
      model: openai/gpt-4o
      rpm: 2
      tpm: 1000

router_settings:
  routing_strategy: usage-based-routing-v2 # 👈 KEY CHANGE
  redis_host: redis
  redis_password: myredissecret
  redis_port: 6379
  enable_pre_call_check: true

general_settings:
  master_key: sk-1234

It doesn't work as I expect, because requests are never blocked even with an higher amount of tokens or an higher amount of requests

Did I miss something?

Thanks

Mo-Morris · 2025-04-24T08:11:04Z

Mo-Morris
Apr 24, 2025

i have the some problem.

0 replies

krrishdholakia · 2025-04-26T19:43:00Z

krrishdholakia
Apr 26, 2025
Maintainer

Hi @rivamarco what is your test script for repro?

0 replies

Mo-Morris · 2025-04-26T19:43:30Z

Mo-Morris
Apr 26, 2025

您发的邮件我已收到，并会尽快处理，谢谢！

0 replies

rivamarco · 2025-05-02T10:41:28Z

rivamarco
May 2, 2025
Author

Hi @krrishdholakia, thanks for the answer:

So if I setup the config.yaml like this:

model_list:
  - model_name: openai/gpt-4o-mini
    litellm_params:
      api_key: os.environ/OPENAI_API_KEY
      model: openai/gpt-4o-mini
      rpm: 5
      tpm: 100

router_settings:
  routing_strategy: usage-based-routing-v2 # 👈 KEY CHANGE
  redis_host: redis
  redis_password: myredissecret
  redis_port: 6379
  enable_pre_call_check: true

general_settings:
  master_key: sk-1234

With a docker compose like this:

services:
  litellm:
    image: litellm/litellm:latest
    ports:
      - "4000:4000"
    environment:
      - OPENAI_API_KEY=mykey
    volumes:
      - ./litellm.yaml:/app/config.yaml
    command: [ "--config", "/app/config.yaml", "--port", "4000" ]
  redis:
    image: redis:7
    restart: always
    command: >
      --requirepass ${REDIS_AUTH:-myredissecret}
    ports:
      - 6379:6379
    healthcheck:
      test: [ "CMD", "redis-cli", "ping" ]
      interval: 3s
      timeout: 10s
      retries: 10
  redis-insight:
    image: redis/redisinsight:latest
    restart: always
    ports:
      - "5540:5540"

I assume that if I perform more than 5 requests per minute and/or I use more than 100 token per minute I obtain an error, is this assumption correct?

For example with this test (I've tried both parallel requests or sequential requests) I am not able to obtain an error but I always have answers (also if I modify the prompt putting Answer in a long and complex way. to consume more tokens:

from openai import OpenAI

import random
import concurrent.futures
import os

client = OpenAI(
    api_key="fake",
    base_url="http://localhost:4000")


# Sample 50 easy questions
questions = [
    "What is 2 + 2?",
    "What color is the sky?",
    "How many legs does a dog have?",
    "What is the capital of France?",
    "Is the Earth round?",
    "What sound does a cat make?",
    "What is the opposite of hot?",
    "What is the first letter of the alphabet?",
    "What do you use to write on paper?",
    "Is fire hot or cold?",
    "What is 5 minus 3?",
    "How many days are in a week?",
    "What do bees make?",
    "What shape has 4 equal sides?",
    "What is water made of?",
    "What is the color of grass?",
    "What is the main language spoken in the USA?",
    "How many hours in a day?",
    "What do cows drink?",
    "What is the freezing point of water in Celsius?",
    "What fruit is yellow and curved?",
    "What is the capital of Italy?",
    "What comes after Monday?",
    "How many wheels does a car have?",
    "Which planet do we live on?",
    "What do you do with your eyes?",
    "How many fingers on one hand?",
    "Is snow hot or cold?",
    "What do you wear on your feet?",
    "What color are bananas?",
    "Which animal barks?",
    "How many letters in the word 'cat'?",
    "What do you use to eat soup?",
    "What is 10 divided by 2?",
    "What is the opposite of fast?",
    "How many months are in a year?",
    "Where does the sun rise?",
    "What is H2O?",
    "What do you wear on your head in winter?",
    "What color are strawberries?",
    "What time is it after 11 AM?",
    "What rhymes with 'hat'?",
    "Which animal has a trunk?",
    "What is 7 + 1?",
    "How do you spell 'dog'?",
    "What is the main ingredient in a salad?",
    "What do you drink in the morning?",
    "What day comes before Friday?",
    "What do you sleep on?",
    "How do you greet someone?"
]

def ask_question(question):
    try:
        response = client.chat.completions.create(
            model="openai/gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Answer in a short and simple way."},
                {"role": "user", "content": question}
            ]
        )
        answer = response.choices[0].message.content
        return (question, answer)
    except Exception as e:
        return (question, f"Error: {str(e)}")


if __name__ == "__main__":
    random.shuffle(questions)
    selected_questions = questions[:50]

    results = []
    with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
        futures = [executor.submit(ask_question, q) for q in selected_questions]
        for future in concurrent.futures.as_completed(futures):
            for a in future.result():
                print(f"{a}")

I believe there is something I do no understand about the way RPM and TPM works.

Thank you very much for the help

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Setting TPM and RPM in proxy seems not working #8016

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Setting TPM and RPM in proxy seems not working #8016

Uh oh!

rivamarco Jan 27, 2025

Replies: 4 comments

Uh oh!

Mo-Morris Apr 24, 2025

Uh oh!

krrishdholakia Apr 26, 2025 Maintainer

Uh oh!

Mo-Morris Apr 26, 2025

Uh oh!

Uh oh!

rivamarco May 2, 2025 Author

rivamarco
Jan 27, 2025

Mo-Morris
Apr 24, 2025

krrishdholakia
Apr 26, 2025
Maintainer

Mo-Morris
Apr 26, 2025

rivamarco
May 2, 2025
Author