Docker inference:  Significant Performance Degradation on First Run

Hello everyone,

I hope this message finds you well. I'm encountering a significant performance degradation on the first run of whisper-cli application compared to subsequent runs. 

Issue Description
When running the application for the first time after starting or rebuilding the container, there is a noticeable slowdown. Subsequent runs are significantly faster, indicating that caching mechanisms or background processes might be affecting the performance on the first run. **The Question is How do I make sure to get consistent performance ?**

Note: AI suggested that I should commit the docker itself but I am not fan of that solution I wish to handle everything in build

Platform: aarch Linux

**Steps to Reproduce**
1. Build the Docker Container
2. Build the Library with `cmake version 3.30.2`
`cmake -B build -DGGML_CUDA=1
cmake --build build -j --config Release`
3. Download whisper models
4. Execute the custom Script Twice:`./run.sh && ./run.sh`
**Expected Behavior**
Consistent execution time for both runs.
The first run should not take significantly longer than subsequent runs.

**Actual Behavior**
The first run takes much longer to execute.
Subsequent runs are much faster.
It keeps repeating every-time i restart the Docker Container
Below is my run.sh 
```#!/bin/bash
export LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/whisper.cpp/build/ggml/src:/whisper.cpp/build/ggml/src/ggml-cuda/:/whisper.cpp/build/src
# Define the command
COMMAND="./build/bin/whisper-cli -m models/ggml-medium.en.bin -f samples/jfk.wav -fa -pc -np"

# Record the start time
START_TIME=$(date +%s.%N)

# Execute the command
$COMMAND

# Record the end time
END_TIME=$(date +%s.%N)

# Calculate the execution time
EXECUTION_TIME=$(echo "$END_TIME - $START_TIME" | bc)

# Print the execution time
echo "Execution Time: ${EXECUTION_TIME}s"
```

And below is the result inside the docker 

```root@jugal:/whisper.cpp# ./run.sh && ./run.sh 

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 478.130996608s

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 5.049899231s
root@jugal:/whisper.cpp# ./run.sh && ./run.sh 

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 4.822853356s

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

Execution Time: 4.710287755s
```

**Request for Help**
I would appreciate any insights, suggestions, or best practices from the community to further optimize the performance of my application during the first run. Any recommendations for profiling tools, caching strategies, or Docker configurations would be greatly appreciated!

**Humble Note**
I have used  `FROM nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.13-py3` to build image if i should use any other base please recommend I tried `nvidia/cuda:11.4.2-devel-ubuntu20.04` But seems to have issue with nvcc. I am quite new to this and might be missing something obvious. I'm looking for guidance on how to improve the performance and ensure that the initial run is not significantly slower than subsequent runs.

Thank you in advance for your help and support.

Best regards, 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docker inference: Significant Performance Degradation on First Run #3017

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docker inference: Significant Performance Degradation on First Run #3017

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions