- Project Overview
- Directory Structure
- Benchmarking Workflows
- Quadratic Equation Benchmark Questions
- Getting Started
- Managing Models in Dockerized Ollama
- Documentation
- Conclusion
This project benchmarks the inference performance, resource utilization, and accuracy of language models on embedded and edge devices. It supports both open-ended generative tasks (v1) and objective, accuracy-focused tasks (v2, such as solving quadratic equations with known answers). The goal is to provide actionable insights for optimizing AI deployments in resource-constrained environments.
Folder | Description |
---|---|
src/ |
v1: Generative benchmarking, analysis, and utilities |
src_v2/ |
v2: Accuracy-focused benchmarking and analysis scripts |
docs/ |
Project documentation and methodology |
data/ |
v1: Raw and processed benchmarking data, outputs, and plots |
data_v2/ |
v2: Processed results and JSON-based question sets |
Conclusion/ |
Final reports and deliverables (e.g., PDF summary) |
- Benchmarks LLMs on open-ended prompts.
- Measures response time, resource usage, and saves model outputs.
- No objective accuracy metric (answers are subjective).
- Data and scripts:
src/
,data/
- Benchmarks LLMs on math questions with objectively correct answers.
- Measures response time, resource usage, and correctness (accuracy).
- Uses a set of standard quadratic equations, one for each root type, stored in JSON.
- Data and scripts:
src_v2/
,data_v2/
Type | Equation | Expected Answer |
---|---|---|
Distinct Real | x^2 - 7x + 12 = 0 | 3, 4 |
Repeated Real | x^2 - 6x + 9 = 0 | 3 |
Complex | x^2 + 2x + 5 = 0 | -1+2i, -1-2i |
Irrational Real | x^2 - 2 = 0 | 1.414, -1.414 |
- Clone the repository
git clone https://github.com/rwandantechy/benchmarking-project-summer-2025.git cd benchmarking-project-summer-2025
- Install dependencies
pip install -r requirements.txt
- Run a generative benchmark (v1)
python benchmark.py --model <model_name>
- Run an accuracy-focused math benchmark (v2)
python src_v2/accuracy_benchmark.py <model_name>
- Uses questions from
data_v2/questions/math_questions.json
- Results saved to
data_v2/processed/accuracy_benchmark_results.csv
- Uses questions from
docker exec -it ollama ollama pull <model_name>
docker exec -it ollama ollama rm <model_name>
- See the
docs/
folder for detailed methodology, system configuration, and code documentation. - Visual summaries and analysis can be found in
docs/05_Benchmark_Visuals.md
and thedata/processed/
directory.
For a comprehensive summary and analysis of the benchmarking results, please refer to the full report: