🧠 I build privacy-first AI tools that run offline — even on 8GB RAM Apple Silicon.
GGUF · LangChain · CLI/RAG pipelines. No cloud. No API. No compromises.
Almost every dev wants to experiment with LLMs — but experimentation means trial and error. And trial and error comes at a cost.
With cloud APIs, those costs compound fast — every prompt, every test run, every misstep eats into time and money. Latency, usage caps, and vendor lock-in only add more friction.
Whether you're a solo builder, startup, or enterprise — cost sensitivity is real.
And if you're working on a tight setup (like I was, with 8GB RAM and no GPU), it's not just inconvenient — it's a hard blocker to progress.
Anyone building on an 8GB MacBook is bound to hit a wall fast.
Every prototype turns into a budgeting decision.
And for those of us who care about privacy or need offline reliability, cloud APIs aren’t just inconvenient — they’re a blocker.
For learners like me, it wasn’t just about building — it was about getting started at all.
Harvey Specter once said:
“When you're backed against the wall, break the goddamn thing down.”
So I did. I flipped the stack — vocal to local — and started building fully local, open-source LLM tools using:
- 🔗 LangChain (retrievers, prompts, agents, memory)
- 🧠 FAISS for vector search
- 🤗 Hugging Face Transformers + SentenceTransformers
- 🧩 llama.cpp with 4-bit GGUF models (Mistral, Zephyr)
- 💡 Custom prompt logic, fallback flows, and user-driven CLI UX
My focus: building lean, reproducible, zero-API workflows — ideal for devs, tinkerers, and anyone building in bandwidth or cost-constrained environments.
Running usable LLMs locally — on just an 8GB MacBook M1 Pro — without APIs, GPUs, or cloud credits?
I'm writing a 4-part Medium series that shares the full journey:
- What breaks, what works, and how far you can push llama.cpp on low RAM
- Models tested: Mistral, Phi, TinyLlama, Zephyr
- Benchmarks, thread tuning, quant formats, and real-world tradeoffs
🧠 Ideal for devs trying to build without burning their wallets or sending data to someone else’s server.
🔗 PART 1/4 Can anything actually run LLMs offline — on just an 8GB MacBook M1 Pro?
🔗 PART 2/4 llama.cpp is in the spotlight — it promises local LLMs. But how usable is it, really?
🔗 PART 3/4 Phi-3-mini takes center stage after Part 2 — now let’s get into the nitty-gritty.
🔗 PART 4/4 From Part 2 & 3: llama.cpp(cli) + phi-3-mini = a powerful local LLM — so… can we make it scale?
Local-first LLMs give you full control over reliability, iteration speed, and customization.
They shift AI from a rented service to a tool you actually own — and most importantly,
they bring the cost down to zero.
That’s what excites me.
Project | Purpose |
---|---|
llm-power-search |
✅ Local RAG pipeline that answers legal questions about open-source licenses using LangChain + FAISS + llama.cpp |
Running Mistral 7B Locally on MacBook M1 Pro: Benchmarking Llama.cpp Python Inference Speed and GPU Trade-offs | 📈 Performance comparison of 4-bit models on Mac M1 using llama.cpp, including speed vs GPU benchmarks |
More tools and ideas in progress — and I’m just getting started. 😄
- 🔗 LangChain · FAISS · SentenceTransformers
- 🧩 llama.cpp · Hugging Face · GGUF 4-bit models
- ⚙️ Python · CLI tooling · Local inference pipelines
- 🧪 PyTorch · TensorFlow (CV/ML background)
- 🧰 C++ (Gtkmm), Python (PyQt/OpenCV) for earlier UI systems
🧠 I specialize in local-first LLM devtools — built for privacy, reproducibility, and edge performance.
If you're building something that needs:
- ✅ Full offline support
- ✅ Reliable RAG pipelines on low-spec devices
- ✅ Streamlit/CLI/PyQt interfaces for local AI
- ✅ Mac M1/M2 performance optimization for LLMs
- ✅ Remote roles in LLM prototyping or AI devtools
- ✅ AI Product Management roles focused on user-first GenAI tools
- ✅ OSS / SaaS collabs with a focus on usability, cost-efficiency, and impact
📩 santhoshnumber1@gmail.com
🔗 LinkedIn →
- ✔️ Prompt Engineering for ChatGPT (Coursera)
- ✔️ Trustworthy Generative AI (Vanderbilt)
- ✔️ ChatGPT Advanced Data Analysis (Vanderbilt)
- 🧠 LangChain Dev Course (DeepLearning.AI)
- 🔬 ChatGPT Prompt Engineering for Developers (OpenAI)
I began my career as a Computer Vision developer — building tools that combined low-level image processing with product intuition.
Projects included:
- 🥔 Size & color–based potato sorting system — image processing algorithm deployed via Google Cloud Functions
- 🧪 Custom designed CNN trained from scratch on a local machine for spliced image forgery detection (600+ epochs) training loss & training accuracy
- 👁️ Early glaucoma detection prototype — built on Raspberry Pi with OpenCV + VR headset integration
- 🚗 Real-time vehicle flow analysis — 24-hour video inference across lanes on AWS servers using YOLO
- 🧰 Internal OpenCV tool replication — led a team replicating a core analytics tool for reuse
- 🧑💻 Full UI/UX design for embedded systems — owned v1 + v2 flow for industrial machine vision tool
From the start, I've owned not just features — but the full flow:
problem
→ interface
→ model
→ deployment
.
That mindset now drives my transition into:
- ✅ LLM prototyping
- ✅ Offline AI tooling
- ✅ End-to-end product thinking
What began as an offline learning constraint turned out to be a blessing — forcing me to focus on privacy, full ownership, and infinite iteration where imagination was the only limit (and system RAM the only bottleneck).
That journey led to zero-cost, local-first tools that work for solo devs, startups, and eventually even cost-sensitive enterprises.
It’s no longer just about building features — I’m evolving into a product manager who takes full ownership, end to end.
🧪 From a young boy who believed that — unlike most things in life — code usually does exactly what you want...
to early repos here that might not mean much to others,
but marked real milestones for me.
And soon: tools that I hope will matter — not just to me, but to many of us building with constraints, creativity, and purpose.