Santhosh santhoshnumberone

👋 Hi, I'm Santhosh — AI DevTool Specialist | llama.cpp · LangChain · RAG | Mac M1 LLM Optimizer

🧠 I build privacy-first AI tools that run offline — even on 8GB RAM Apple Silicon.
GGUF · LangChain · CLI/RAG pipelines. No cloud. No API. No compromises.

🧩 The Pain

Almost every dev wants to experiment with LLMs — but experimentation means trial and error. And trial and error comes at a cost.

With cloud APIs, those costs compound fast — every prompt, every test run, every misstep eats into time and money. Latency, usage caps, and vendor lock-in only add more friction.

Whether you're a solo builder, startup, or enterprise — cost sensitivity is real.
And if you're working on a tight setup (like I was, with 8GB RAM and no GPU), it's not just inconvenient — it's a hard blocker to progress.

💥 The Breaking Point

Anyone building on an 8GB MacBook is bound to hit a wall fast.
Every prototype turns into a budgeting decision.

And for those of us who care about privacy or need offline reliability, cloud APIs aren’t just inconvenient — they’re a blocker.
For learners like me, it wasn’t just about building — it was about getting started at all.

🛠️ The Build

Harvey Specter once said:

“When you're backed against the wall, break the goddamn thing down.”

So I did. I flipped the stack — vocal to local — and started building fully local, open-source LLM tools using:

🔗 LangChain (retrievers, prompts, agents, memory)
🧠 FAISS for vector search
🤗 Hugging Face Transformers + SentenceTransformers
🧩 llama.cpp with 4-bit GGUF models (Mistral, Zephyr)
💡 Custom prompt logic, fallback flows, and user-driven CLI UX

My focus: building lean, reproducible, zero-API workflows — ideal for devs, tinkerers, and anyone building in bandwidth or cost-constrained environments.

📖 Featured Series

13" 8GB MacBook M1 pro. No Cloud.No API. Decent speed. Usable LLM On Zero Budget?

Running usable LLMs locally — on just an 8GB MacBook M1 Pro — without APIs, GPUs, or cloud credits?

I'm writing a 4-part Medium series that shares the full journey:

What breaks, what works, and how far you can push llama.cpp on low RAM
Models tested: Mistral, Phi, TinyLlama, Zephyr
Benchmarks, thread tuning, quant formats, and real-world tradeoffs

🧠 Ideal for devs trying to build without burning their wallets or sending data to someone else’s server.

🔗 PART 1/4 Can anything actually run LLMs offline — on just an 8GB MacBook M1 Pro?

🔗 PART 2/4 llama.cpp is in the spotlight — it promises local LLMs. But how usable is it, really?

🔗 PART 3/4 Phi-3-mini takes center stage after Part 2 — now let’s get into the nitty-gritty.

🔗 PART 4/4 From Part 2 & 3: llama.cpp(cli) + phi-3-mini = a powerful local LLM — so… can we make it scale?

🔎 The Insight

Local-first LLMs give you full control over reliability, iteration speed, and customization.
They shift AI from a rented service to a tool you actually own — and most importantly,
they bring the cost down to zero.

That’s what excites me.

📊 The Proof

Project	Purpose
`llm-power-search`	✅ Local RAG pipeline that answers legal questions about open-source licenses using LangChain + FAISS + llama.cpp
Running Mistral 7B Locally on MacBook M1 Pro: Benchmarking Llama.cpp Python Inference Speed and GPU Trade-offs	📈 Performance comparison of 4-bit models on Mac M1 using llama.cpp, including speed vs GPU benchmarks

More tools and ideas in progress — and I’m just getting started. 😄

🧠 Tech Stack

🔗 LangChain · FAISS · SentenceTransformers
🧩 llama.cpp · Hugging Face · GGUF 4-bit models
⚙️ Python · CLI tooling · Local inference pipelines
🧪 PyTorch · TensorFlow (CV/ML background)
🧰 C++ (Gtkmm), Python (PyQt/OpenCV) for earlier UI systems

🔁 In Short

🧠 I specialize in local-first LLM devtools — built for privacy, reproducibility, and edge performance.

If you're building something that needs:

✅ Full offline support
✅ Reliable RAG pipelines on low-spec devices
✅ Streamlit/CLI/PyQt interfaces for local AI
✅ Mac M1/M2 performance optimization for LLMs

🚀 I’m Open To:

✅ Remote roles in LLM prototyping or AI devtools
✅ AI Product Management roles focused on user-first GenAI tools
✅ OSS / SaaS collabs with a focus on usability, cost-efficiency, and impact

📩 santhoshnumber1@gmail.com
🔗 LinkedIn →

🎓 Learning & Certifications

✔️ Prompt Engineering for ChatGPT (Coursera)
✔️ Trustworthy Generative AI (Vanderbilt)
✔️ ChatGPT Advanced Data Analysis (Vanderbilt)
🧠 LangChain Dev Course (DeepLearning.AI)
🔬 ChatGPT Prompt Engineering for Developers (OpenAI)

🔁 My Journey So Far

📍 Where I Started

I began my career as a Computer Vision developer — building tools that combined low-level image processing with product intuition.

Projects included:

🥔 Size & color–based potato sorting system — image processing algorithm deployed via Google Cloud Functions
🧪 Custom designed CNN trained from scratch on a local machine for spliced image forgery detection (600+ epochs) training loss & training accuracy
👁️ Early glaucoma detection prototype — built on Raspberry Pi with OpenCV + VR headset integration
🚗 Real-time vehicle flow analysis — 24-hour video inference across lanes on AWS servers using YOLO
🧰 Internal OpenCV tool replication — led a team replicating a core analytics tool for reuse
🧑‍💻 Full UI/UX design for embedded systems — owned v1 + v2 flow for industrial machine vision tool

🔄 Where I Am Now

From the start, I've owned not just features — but the full flow: problem → interface → model → deployment. That mindset now drives my transition into:

✅ LLM prototyping
✅ Offline AI tooling
✅ End-to-end product thinking

What began as an offline learning constraint turned out to be a blessing — forcing me to focus on privacy, full ownership, and infinite iteration where imagination was the only limit (and system RAM the only bottleneck).

That journey led to zero-cost, local-first tools that work for solo devs, startups, and eventually even cost-sensitive enterprises.

It’s no longer just about building features — I’m evolving into a product manager who takes full ownership, end to end.

🧪 From a young boy who believed that — unlike most things in life — code usually does exactly what you want...
to early repos here that might not mean much to others,
but marked real milestones for me.
And soon: tools that I hope will matter — not just to me, but to many of us building with constraints, creativity, and purpose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly