You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope this message finds you well. I'm encountering a significant performance degradation on the first run of whisper-cli application compared to subsequent runs.
Issue Description
When running the application for the first time after starting or rebuilding the container, there is a noticeable slowdown. Subsequent runs are significantly faster, indicating that caching mechanisms or background processes might be affecting the performance on the first run. The Question is How do I make sure to get consistent performance ?
Note: AI suggested that I should commit the docker itself but I am not fan of that solution I wish to handle everything in build
Platform: aarch Linux
Steps to Reproduce
Build the Docker Container
Build the Library with cmake version 3.30.2 cmake -B build -DGGML_CUDA=1 cmake --build build -j --config Release
Download whisper models
Execute the custom Script Twice:./run.sh && ./run.sh Expected Behavior
Consistent execution time for both runs.
The first run should not take significantly longer than subsequent runs.
Actual Behavior
The first run takes much longer to execute.
Subsequent runs are much faster.
It keeps repeating every-time i restart the Docker Container
Below is my run.sh
export LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/whisper.cpp/build/ggml/src:/whisper.cpp/build/ggml/src/ggml-cuda/:/whisper.cpp/build/src
# Define the command
COMMAND="./build/bin/whisper-cli -m models/ggml-medium.en.bin -f samples/jfk.wav -fa -pc -np"
# Record the start time
START_TIME=$(date +%s.%N)
# Execute the command
$COMMAND
# Record the end time
END_TIME=$(date +%s.%N)
# Calculate the execution time
EXECUTION_TIME=$(echo "$END_TIME - $START_TIME" | bc)
# Print the execution time
echo "Execution Time: ${EXECUTION_TIME}s"
And below is the result inside the docker
[00:00:00.000 --> 00:00:03.000] And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000] ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000] ask what you can do for your country.
Execution Time: 478.130996608s
[00:00:00.000 --> 00:00:03.000] And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000] ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000] ask what you can do for your country.
Execution Time: 5.049899231s
root@jugal:/whisper.cpp# ./run.sh && ./run.sh
[00:00:00.000 --> 00:00:03.000] And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000] ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000] ask what you can do for your country.
Execution Time: 4.822853356s
[00:00:00.000 --> 00:00:03.000] And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000] ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000] ask what you can do for your country.
Execution Time: 4.710287755s
Request for Help
I would appreciate any insights, suggestions, or best practices from the community to further optimize the performance of my application during the first run. Any recommendations for profiling tools, caching strategies, or Docker configurations would be greatly appreciated!
Humble Note
I have used FROM nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.13-py3 to build image if i should use any other base please recommend I tried nvidia/cuda:11.4.2-devel-ubuntu20.04 But seems to have issue with nvcc. I am quite new to this and might be missing something obvious. I'm looking for guidance on how to improve the performance and ensure that the initial run is not significantly slower than subsequent runs.
Thank you in advance for your help and support.
Best regards,
The text was updated successfully, but these errors were encountered:
Hello everyone,
I hope this message finds you well. I'm encountering a significant performance degradation on the first run of whisper-cli application compared to subsequent runs.
Issue Description
When running the application for the first time after starting or rebuilding the container, there is a noticeable slowdown. Subsequent runs are significantly faster, indicating that caching mechanisms or background processes might be affecting the performance on the first run. The Question is How do I make sure to get consistent performance ?
Note: AI suggested that I should commit the docker itself but I am not fan of that solution I wish to handle everything in build
Platform: aarch Linux
Steps to Reproduce
cmake version 3.30.2
cmake -B build -DGGML_CUDA=1 cmake --build build -j --config Release
./run.sh && ./run.sh
Expected Behavior
Consistent execution time for both runs.
The first run should not take significantly longer than subsequent runs.
Actual Behavior
The first run takes much longer to execute.
Subsequent runs are much faster.
It keeps repeating every-time i restart the Docker Container
Below is my run.sh
And below is the result inside the docker
Request for Help
I would appreciate any insights, suggestions, or best practices from the community to further optimize the performance of my application during the first run. Any recommendations for profiling tools, caching strategies, or Docker configurations would be greatly appreciated!
Humble Note
I have used
FROM nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.13-py3
to build image if i should use any other base please recommend I triednvidia/cuda:11.4.2-devel-ubuntu20.04
But seems to have issue with nvcc. I am quite new to this and might be missing something obvious. I'm looking for guidance on how to improve the performance and ensure that the initial run is not significantly slower than subsequent runs.Thank you in advance for your help and support.
Best regards,
The text was updated successfully, but these errors were encountered: