Local LLM Comparator

Overview

The Local LLM Comparator is an interactive tool designed to help you systematically evaluate and compare multiple local language models (LLMs) side by side.

Why Use This Tool?

Objective Comparison: Manually compare responses from different models without bias
Flexible Evaluation: Easily test models on any prompt or task
Comprehensive Ranking: Generate advanced scoring based on pairwise comparisons
Logging and Analysis: Automatically log comparisons and generate ranking reports

How It Works

Enter your prompt in the main function.

Presents responses from two models simultaneously
You choose which response is better (or skip)
Tracks direct wins between models, calculates scores considering win rate and comparison frequency, and generates a comprehensive ranking.

Screenshots

Side-by-Side Model Comparison

Model Rankings

Prerequisites

Ollama (for local model inference)
- LLMs from ollama
Required Python packages:
- ollama
- requests

Installation

# Clone the repository
git clone https://github.com/yourusername/local-llm-comparator.git
cd local-llm-comparator

# Install dependencies
pip install -r requirements.txt

# Ensure Ollama is installed and models are pulled
ollama pull llama3.2
ollama pull gemma2:2b
# Pull any other models you want to compare

Configuring Models

Modify the _configure_model_clients() method in the ModelComparisonEvaluator class to customize your model list:

def _configure_model_clients(self) -> Dict[str, ModelClient]:
    return {
        **{model: OllamaClient() for model in [
            "llama3.2:latest",
            "gemma2:2b",
            # Add or remove models as needed
        ]}
    }

Usage Example

In the main function update the prompt and run the code.

def main():
    # Example: Extracting job details from a URL
    url = "your_target_url"
    response = requests.get(url)
    page_content = response.text

    prompt = f"""Your specific task description.
    Content to analyze:
    {page_content}"""
    
    evaluator = ModelComparisonEvaluator()
    evaluator.run_comprehensive_comparisons(prompt)
    evaluator.print_rankings()

Outputs

The tool generates two key outputs:

Comparison Log: comparison_log.txt
- Tracks which model won each comparison
- Includes timestamp of comparisons
Rankings CSV: model_rankings_[timestamp].csv
- Contains final model rankings
- Includes advanced scoring metrics

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
comparison_sample.png		comparison_sample.png
llm-pair-comparison.py		llm-pair-comparison.py
model_rank_sample.png		model_rank_sample.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local LLM Comparator

Overview

Why Use This Tool?

How It Works

Screenshots

Side-by-Side Model Comparison

Model Rankings

Prerequisites

Installation

Configuring Models

Usage Example

Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Languages

greg-randall/local-llm-comparator

Folders and files

Latest commit

History

Repository files navigation

Local LLM Comparator

Overview

Why Use This Tool?

How It Works

Screenshots

Side-by-Side Model Comparison

Model Rankings

Prerequisites

Installation

Configuring Models

Usage Example

Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages