Make mock_delay non-blocking for async acompletion calls #9852

gustavz · 2025-04-09T13:04:24Z

gustavz
Apr 9, 2025

Async acompletion runs sync completion in executor of the event loop partially with all call kwargs.
If we set mock_response to True and specify a mock_delay, then then completion does a sync time.sleep for time_delay. This blocks the event loop when making multiple acompletion calls with mock_delay concurrently.
We should instead use asyncio.sleep for the case of mocking an async response, to make sure the event loop is not blocked and concurrent calls can be mocked.

gustavz · 2025-04-17T06:33:29Z

gustavz
Apr 17, 2025
Author

OK so it seems mock_delay does indeed block async workers. We ran tests for up to 100 users and as you can see, when using mock_response together with mock_delay, total requests per seconds stay roughly flat while response times increase with increasing number of users:

whereas with real non-mocked calls, response time stays roughly flat and total requests per seconds increases with increasing number of users:

0 replies

gustavz · 2025-04-17T06:34:28Z

gustavz
Apr 17, 2025
Author

@krrishdholakia for your awareness, is it expected that mock_delay blocks the async event loop and can therefore not be used for testing concurrent calls?

0 replies

gustavz · 2025-04-17T07:42:16Z

gustavz
Apr 17, 2025
Author

A simple local test setup is:

import asyncio
import time
import litellm

concurrent_requests = 100

async def acompletion_with_async_sleep(mock_delay):
    await asyncio.sleep(mock_delay)
    return "pong"

async def acompletion(i):
    start_time = time.time()
    print(f"Task {i}: Starting")
    response = await litellm.acompletion(
        model="azure/gpt-4",
        messages=[{"role": "user", "content": "ping"}],
        mock_response=True,
        mock_delay=4,
        )
    print(f"Task {i}: Finished in {round(time.time() - start_time, 2)} seconds")
    return response

tasks = [acompletion(i) for i in range(concurrent_requests)]
results = await asyncio.gather(*tasks)

with output:

Task 0: Starting
Task 1: Starting
Task 2: Starting
...
Task 97: Starting
Task 98: Starting
Task 99: Starting
Task 0: Finished in 4.01 seconds
...
Task 16: Finished in 8.02 seconds
..
Task 32: Finished in 12.03 seconds
...
Task 49: Finished in 16.04 seconds
...
Task 65: Finished in 20.05 seconds
...
Task 81: Finished in 24.05 seconds
...
Task 98: Finished in 28.05 seconds

But when using acompletion_with_async_sleep instead, all tasks finish after 4 seconds

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make mock_delay non-blocking for async acompletion calls #9852

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Make mock_delay non-blocking for async acompletion calls #9852

Uh oh!

gustavz Apr 9, 2025

Replies: 3 comments

Uh oh!

gustavz Apr 17, 2025 Author

Uh oh!

gustavz Apr 17, 2025 Author

Uh oh!

Uh oh!

gustavz Apr 17, 2025 Author

gustavz
Apr 9, 2025

gustavz
Apr 17, 2025
Author

gustavz
Apr 17, 2025
Author

gustavz
Apr 17, 2025
Author