A hands-on benchmark comparing ctransformers
vs llama.cpp
for local inference of quantized GGUF models (mistral
, zephyr
) on an M1 MacBook Pro (8GB RAM).
Library | Speed | Simplicity | Best Use Case |
---|---|---|---|
ctransformers | ~15 seconds | ✅ Easy | Rapid prototyping |
llama.cpp | ~10 seconds | RAG pipelines, speed-sensitive apps |
Read the full write-up on Medium
Follow the author on LinkedIn
git clone https://github.com/santhoshnumberone/llm-benchmarks-mac.git
cd llm-benchmarks-mac
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Download the .gguf
files from Hugging Face, don't forget to change path inside code:
ctransformers
python benchmark_ctransformers.py
llama.cpp via llama-cpp-python
python benchmark_llamacpp.py
Model | Library | Time Taken | Output (Shortened) |
---|---|---|---|
Mistral | ctransformers | 15.14s | "You may sublicense if terms are met..." |
Zephyr | llama-cpp-python | 12.63s | "It depends on the license..." |
This repo is ideal for:
- AI engineers testing local LLM inference
- Prototyping RAG apps with speed constraints
- Comparing backend performance tradeoffs
👤 Santhosh — Builder & AI Engineer
📩 LinkedIn | ✍️ Medium