Skip to content

Docker inference: Significant Performance Degradation on First Run #3017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jugal-sheth opened this issue Apr 8, 2025 · 0 comments
Open

Comments

@jugal-sheth
Copy link

Hello everyone,

I hope this message finds you well. I'm encountering a significant performance degradation on the first run of whisper-cli application compared to subsequent runs.

Issue Description
When running the application for the first time after starting or rebuilding the container, there is a noticeable slowdown. Subsequent runs are significantly faster, indicating that caching mechanisms or background processes might be affecting the performance on the first run. The Question is How do I make sure to get consistent performance ?

Note: AI suggested that I should commit the docker itself but I am not fan of that solution I wish to handle everything in build

Platform: aarch Linux

Steps to Reproduce

  1. Build the Docker Container
  2. Build the Library with cmake version 3.30.2
    cmake -B build -DGGML_CUDA=1 cmake --build build -j --config Release
  3. Download whisper models
  4. Execute the custom Script Twice:./run.sh && ./run.sh
    Expected Behavior
    Consistent execution time for both runs.
    The first run should not take significantly longer than subsequent runs.

Actual Behavior
The first run takes much longer to execute.
Subsequent runs are much faster.
It keeps repeating every-time i restart the Docker Container
Below is my run.sh

export LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/whisper.cpp/build/ggml/src:/whisper.cpp/build/ggml/src/ggml-cuda/:/whisper.cpp/build/src
# Define the command
COMMAND="./build/bin/whisper-cli -m models/ggml-medium.en.bin -f samples/jfk.wav -fa -pc -np"

# Record the start time
START_TIME=$(date +%s.%N)

# Execute the command
$COMMAND

# Record the end time
END_TIME=$(date +%s.%N)

# Calculate the execution time
EXECUTION_TIME=$(echo "$END_TIME - $START_TIME" | bc)

# Print the execution time
echo "Execution Time: ${EXECUTION_TIME}s"

And below is the result inside the docker


[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 478.130996608s

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 5.049899231s
root@jugal:/whisper.cpp# ./run.sh && ./run.sh 

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 4.822853356s

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 4.710287755s

Request for Help
I would appreciate any insights, suggestions, or best practices from the community to further optimize the performance of my application during the first run. Any recommendations for profiling tools, caching strategies, or Docker configurations would be greatly appreciated!

Humble Note
I have used FROM nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.13-py3 to build image if i should use any other base please recommend I tried nvidia/cuda:11.4.2-devel-ubuntu20.04 But seems to have issue with nvcc. I am quite new to this and might be missing something obvious. I'm looking for guidance on how to improve the performance and ensure that the initial run is not significantly slower than subsequent runs.

Thank you in advance for your help and support.

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant