llama-server RAM consumption #8905

tunglambk · 2024-08-07T07:24:46Z

tunglambk
Aug 7, 2024

I need to measure how much RAM is necessary to run the llama-server so I run this script to print the VSZ and RSS

#!/bin/bash

# Check if PID is provided as an argument
if [ -z "$1" ]; then
    echo "Usage: $0 <PID>"
    exit 1
fi

PID=$1  # Assign the first argument as the PID

# Monitor the memory usage
while true; do
    # Get the RSS memory usage in kilobytes
    KB=$(ps -p $PID -o rss=)

    # Check if the process is still running
    if [ -n "$KB" ]; then
        # Convert kilobytes to megabytes with two decimal precision
        MB=$(echo "scale=2; $KB/1024" | bc)
        echo "Memory usage of PID $PID: $MB MB"
    else
        echo "Process $PID not found or has exited"
        break
    fi

    # Sleep for 1 second before checking again
    sleep 1
done

I tried sending requests continuously to the server and monitor these metrics and I see they increase gradually and not reduce

Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.47 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.47 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 623.55 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.07 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.07 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.17 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.48 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 624.78 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.29 MB, RSS = 625.40 MB
Memory usage of PID 1630962: VSZ = 93308.46 MB, RSS = 625.88 MB
Memory usage of PID 1630962: VSZ = 93308.46 MB, RSS = 626.26 MB

Is this normal? Is there any way to determine the necessary RAM or allocate exactly the number of RAM for llama-server?

Thank you so much

okuvshynov · 2024-08-09T13:11:09Z

okuvshynov
Aug 9, 2024

When you start the server, you should see lines like this:

...
llm_load_tensors:      Metal buffer size = 71494.30 MiB
....
llama_new_context_with_model: KV self size  = 40960.00 MiB

Something slightly higher than sum of these numbers should be a good starting point for the model/context size/context data type combination you use.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-server RAM consumption #8905

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llama-server RAM consumption #8905

Uh oh!

tunglambk Aug 7, 2024

Replies: 1 comment

Uh oh!

okuvshynov Aug 9, 2024

tunglambk
Aug 7, 2024

okuvshynov
Aug 9, 2024