Benchmarking Project — Summer 2025

Project Overview

This project benchmarks the inference performance, resource utilization, and accuracy of language models on embedded and edge devices. It supports both open-ended generative tasks (v1) and objective, accuracy-focused tasks (v2, such as solving quadratic equations with known answers). The goal is to provide actionable insights for optimizing AI deployments in resource-constrained environments.

Directory Structure

Folder	Description
`src/`	v1: Generative benchmarking, analysis, and utilities
`src_v2/`	v2: Accuracy-focused benchmarking and analysis scripts
`docs/`	Project documentation and methodology
`data/`	v1: Raw and processed benchmarking data, outputs, and plots
`data_v2/`	v2: Processed results and JSON-based question sets
`Conclusion/`	Final reports and deliverables (e.g., PDF summary)

Benchmarking Workflows

Generative Benchmarking (v1)

Benchmarks LLMs on open-ended prompts.
Measures response time, resource usage, and saves model outputs.
No objective accuracy metric (answers are subjective).
Data and scripts: src/, data/

Accuracy-Focused Benchmarking (v2)

Benchmarks LLMs on math questions with objectively correct answers.
Measures response time, resource usage, and correctness (accuracy).
Uses a set of standard quadratic equations, one for each root type, stored in JSON.
Data and scripts: src_v2/, data_v2/

Quadratic Equation Benchmark Questions

Type	Equation	Expected Answer
Distinct Real	x^2 - 7x + 12 = 0	3, 4
Repeated Real	x^2 - 6x + 9 = 0	3
Complex	x^2 + 2x + 5 = 0	-1+2i, -1-2i
Irrational Real	x^2 - 2 = 0	1.414, -1.414

Getting Started

Clone the repository

git clone https://github.com/rwandantechy/benchmarking-project-summer-2025.git
cd benchmarking-project-summer-2025

Install dependencies
```
pip install -r requirements.txt
```

Run a generative benchmark (v1)

python benchmark.py --model <model_name>

Run an accuracy-focused math benchmark (v2)
```
python src_v2/accuracy_benchmark.py <model_name>
```
- Uses questions from data_v2/questions/math_questions.json
- Results saved to data_v2/processed/accuracy_benchmark_results.csv

Managing Models in Dockerized Ollama

Adding a Model

docker exec -it ollama ollama pull <model_name>

Removing a Model

docker exec -it ollama ollama rm <model_name>

Documentation

See the docs/ folder for detailed methodology, system configuration, and code documentation.
Visual summaries and analysis can be found in docs/05_Benchmark_Visuals.md and the data/processed/ directory.

Conclusion

For a comprehensive summary and analysis of the benchmarking results, please refer to the full report:

Benchmarking Report (PDF)

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
Conclusion		Conclusion
data/processed		data/processed
data_v2/processed		data_v2/processed
docs		docs
src		src
src_v2		src_v2
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_v2.md		README_v2.md
benchmark.py		benchmark.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking Project — Summer 2025

Table of Contents

Project Overview

Directory Structure

Benchmarking Workflows

Generative Benchmarking (v1)

Accuracy-Focused Benchmarking (v2)

Quadratic Equation Benchmark Questions

Getting Started

Managing Models in Dockerized Ollama

Adding a Model

Removing a Model

Documentation

Conclusion

About

Uh oh!

Releases

Packages

Languages

License

rwandantechy/benchmarking-project-summer-2025

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Project — Summer 2025

Table of Contents

Project Overview

Directory Structure

Benchmarking Workflows

Generative Benchmarking (v1)

Accuracy-Focused Benchmarking (v2)

Quadratic Equation Benchmark Questions

Getting Started

Managing Models in Dockerized Ollama

Adding a Model

Removing a Model

Documentation

Conclusion

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages