🚀 Ollama Benchmarks

A comprehensive benchmarking suite for evaluating Ollama models on various performance metrics.

📋 Table of Contents

🔭 Overview
✨ Features
🛠️ Prerequisites
📁 Project Structure
🚀 Usage
- Running Individual Benchmarks
- Running All Benchmarks
📊 Benchmark Types
📈 Analyzing Results
📊 Workflow Visualization
👥 Contributing
📜 License

🔭 Overview

Ollama Benchmarks is a toolset for rigorously testing and comparing the performance of different large language models running via Ollama. The suite measures critical metrics including inference speed, memory usage, and parameter efficiency across different prompts and configurations.

✨ Features

📊 Measure inference speed (tokens per second)
💾 Monitor memory consumption (RAM and VRAM)
📏 Evaluate parameter efficiency
📚 Test performance with varying context lengths
📈 Analyze and compare results across models

🛠️ Prerequisites

Ollama installed and configured
Bash shell environment
Basic command line utilities (bc, nvidia-smi for GPU metrics)
Python 3.x for results analysis

📁 Project Structure

ollama-benchmarks/
├── benchmark_speed.sh      # Speed benchmarking script
├── benchmark_memory.sh     # Memory usage benchmarking script
├── benchmark_params.sh     # Parameter efficiency benchmarking
├── benchmark_context.sh    # Context length benchmarking
├── run_all_benchmarks.sh   # Script to run all benchmarks sequentially
├── analyze_results.py      # Python script to analyze and visualize results
├── prompts/                # Directory containing test prompts
│   ├── creative.txt        # Creative writing prompts
│   ├── short_qa.txt        # Question-answering prompts
│   └── long_context.txt    # Long context evaluation prompts
├── results/                # Directory where benchmark results are stored
└── logs/                   # Log files directory

🚀 Usage

Running Individual Benchmarks

Each benchmark script follows a similar pattern:

./benchmark_[type].sh [MODEL_NAME] [CONFIG_NAME] [PROMPT_FILE]

For example:

./benchmark_speed.sh llama2 default prompts/short_qa.txt

Running All Benchmarks

To run all benchmark types for a specific model:

./run_all_benchmarks.sh [MODEL_NAME] [CONFIG_NAME]

📊 Benchmark Types

Speed Benchmark

Measures inference speed in tokens per second for each prompt.

./benchmark_speed.sh [MODEL_NAME] [CONFIG_NAME] [PROMPT_FILE]

The script:

Processes each prompt in the specified file
Measures generation time and token count
Calculates tokens per second
Outputs results to results/[CONFIG_NAME]_speed_results.csv

Memory Benchmark

Measures CPU utilization, RAM, and VRAM usage during inference.

./benchmark_memory.sh [MODEL_NAME] [CONFIG_NAME] [PROMPT_FILE]

The script:

Runs the model in the background
Samples CPU usage and memory consumption
Detects GPU memory usage if applicable
Outputs results to results/[CONFIG_NAME]_memory_results.csv

Parameter Benchmark

Evaluates how efficiently the model uses its parameters across different prompt types.

./benchmark_params.sh [MODEL_NAME] [CONFIG_NAME] [PROMPT_FILE]

Context Length Benchmark

Tests model performance with varying context window sizes.

./benchmark_context.sh [MODEL_NAME] [CONFIG_NAME]

📈 Analyzing Results

After running benchmarks, analyze the results using the provided Python script:

python analyze_results.py [CONFIG_NAME]

This will generate visualizations and summary statistics for all benchmarks with the specified configuration name.

📊 Workflow Visualization

graph TD
    A[Select Model] --> B[Choose Benchmark Type]
    B --> C1[Speed Benchmark]
    B --> C2[Memory Benchmark]
    B --> C3[Parameter Benchmark]
    B --> C4[Context Length Benchmark]
    
    C1 --> D1[Generate CSV Results]
    C2 --> D2[Generate CSV Results]
    C3 --> D3[Generate CSV Results]
    C4 --> D4[Generate CSV Results]
    
    D1 --> E[Analyze Results]
    D2 --> E
    D3 --> E
    D4 --> E
    
    E --> F[Generate Visualizations]
    F --> G[Compare Models]

Sample Benchmark Process

sequenceDiagram
    participant User
    participant Benchmark Script
    participant Ollama
    participant Results File
    
    User->>Benchmark Script: Run with model & prompts
    Benchmark Script->>Ollama: Execute prompt
    Note over Benchmark Script: Start timer
    Ollama->>Benchmark Script: Return generated text
    Note over Benchmark Script: Stop timer
    Benchmark Script->>Benchmark Script: Calculate metrics
    Benchmark Script->>Results File: Write results
    Benchmark Script->>User: Display summary

👥 Contributing

Contributions are welcome! To contribute:

Fork the repository
Create a new branch for your feature
Add your changes
Submit a pull request

Please ensure your code follows the project's style guidelines and includes appropriate tests.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Created and maintained by Bjorn Melin, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Ollama Benchmarks

📋 Table of Contents

🔭 Overview

✨ Features

🛠️ Prerequisites

📁 Project Structure

🚀 Usage

Running Individual Benchmarks

Running All Benchmarks

📊 Benchmark Types

Speed Benchmark

Memory Benchmark

Parameter Benchmark

Context Length Benchmark

📈 Analyzing Results

📊 Workflow Visualization

Sample Benchmark Process

👥 Contributing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
logs		logs
prompts		prompts
results		results
.gitignore		.gitignore
README.md		README.md
analyze_results.py		analyze_results.py
benchmark_context.sh		benchmark_context.sh
benchmark_memory.sh		benchmark_memory.sh
benchmark_params.sh		benchmark_params.sh
benchmark_speed.sh		benchmark_speed.sh
pyproject.toml		pyproject.toml
run_all_benchmarks.sh		run_all_benchmarks.sh

BjornMelin/local-llm-workbench

Folders and files

Latest commit

History

Repository files navigation

🚀 Ollama Benchmarks

📋 Table of Contents

🔭 Overview

✨ Features

🛠️ Prerequisites

📁 Project Structure

🚀 Usage

Running Individual Benchmarks

Running All Benchmarks

📊 Benchmark Types

Speed Benchmark

Memory Benchmark

Parameter Benchmark

Context Length Benchmark

📈 Analyzing Results

📊 Workflow Visualization

Sample Benchmark Process

👥 Contributing

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages