Skip to content
View santhoshnumberone's full-sized avatar
👻
👻
  • Bangalore

Block or report santhoshnumberone

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
santhoshnumberone/README.md

👋 Hi, I'm Santhosh — AI DevTool Specialist | llama.cpp · LangChain · RAG | Mac M1 LLM Optimizer

🧠 I build privacy-first AI tools that run offline — even on 8GB RAM Apple Silicon.
GGUF · LangChain · CLI/RAG pipelines. No cloud. No API. No compromises.


🧩 The Pain

Almost every dev wants to experiment with LLMs — but experimentation means trial and error. And trial and error comes at a cost.

With cloud APIs, those costs compound fast — every prompt, every test run, every misstep eats into time and money. Latency, usage caps, and vendor lock-in only add more friction.

Whether you're a solo builder, startup, or enterprise — cost sensitivity is real.
And if you're working on a tight setup (like I was, with 8GB RAM and no GPU), it's not just inconvenient — it's a hard blocker to progress.


💥 The Breaking Point

Anyone building on an 8GB MacBook is bound to hit a wall fast.
Every prototype turns into a budgeting decision.

And for those of us who care about privacy or need offline reliability, cloud APIs aren’t just inconvenient — they’re a blocker.
For learners like me, it wasn’t just about building — it was about getting started at all.


🛠️ The Build

Harvey Specter once said:

“When you're backed against the wall, break the goddamn thing down.”

So I did. I flipped the stack — vocal to local — and started building fully local, open-source LLM tools using:

  • 🔗 LangChain (retrievers, prompts, agents, memory)
  • 🧠 FAISS for vector search
  • 🤗 Hugging Face Transformers + SentenceTransformers
  • 🧩 llama.cpp with 4-bit GGUF models (Mistral, Zephyr)
  • 💡 Custom prompt logic, fallback flows, and user-driven CLI UX

My focus: building lean, reproducible, zero-API workflows — ideal for devs, tinkerers, and anyone building in bandwidth or cost-constrained environments.


📖 Featured Series

13" 8GB MacBook M1 pro. No Cloud.No API. Decent speed. Usable LLM On Zero Budget?

Running usable LLMs locally — on just an 8GB MacBook M1 Pro — without APIs, GPUs, or cloud credits?

I'm writing a 4-part Medium series that shares the full journey:

  • What breaks, what works, and how far you can push llama.cpp on low RAM
  • Models tested: Mistral, Phi, TinyLlama, Zephyr
  • Benchmarks, thread tuning, quant formats, and real-world tradeoffs

🧠 Ideal for devs trying to build without burning their wallets or sending data to someone else’s server.

🔗 PART 1/4 Can anything actually run LLMs offline — on just an 8GB MacBook M1 Pro?

🔗 PART 2/4 llama.cpp is in the spotlight — it promises local LLMs. But how usable is it, really?

🔗 PART 3/4 Phi-3-mini takes center stage after Part 2 — now let’s get into the nitty-gritty.

🔗 PART 4/4 From Part 2 & 3: llama.cpp(cli) + phi-3-mini = a powerful local LLM — so… can we make it scale?


🔎 The Insight

Local-first LLMs give you full control over reliability, iteration speed, and customization.
They shift AI from a rented service to a tool you actually own — and most importantly,
they bring the cost down to zero.

That’s what excites me.


📊 The Proof

Project Purpose
llm-power-search ✅ Local RAG pipeline that answers legal questions about open-source licenses using LangChain + FAISS + llama.cpp
Running Mistral 7B Locally on MacBook M1 Pro: Benchmarking Llama.cpp Python Inference Speed and GPU Trade-offs 📈 Performance comparison of 4-bit models on Mac M1 using llama.cpp, including speed vs GPU benchmarks

More tools and ideas in progress — and I’m just getting started. 😄


🧠 Tech Stack

  • 🔗 LangChain · FAISS · SentenceTransformers
  • 🧩 llama.cpp · Hugging Face · GGUF 4-bit models
  • ⚙️ Python · CLI tooling · Local inference pipelines
  • 🧪 PyTorch · TensorFlow (CV/ML background)
  • 🧰 C++ (Gtkmm), Python (PyQt/OpenCV) for earlier UI systems

🔁 In Short

🧠 I specialize in local-first LLM devtools — built for privacy, reproducibility, and edge performance.

If you're building something that needs:

  • ✅ Full offline support
  • ✅ Reliable RAG pipelines on low-spec devices
  • ✅ Streamlit/CLI/PyQt interfaces for local AI
  • ✅ Mac M1/M2 performance optimization for LLMs

🚀 I’m Open To:

  • ✅ Remote roles in LLM prototyping or AI devtools
  • AI Product Management roles focused on user-first GenAI tools
  • ✅ OSS / SaaS collabs with a focus on usability, cost-efficiency, and impact

📩 santhoshnumber1@gmail.com
🔗 LinkedIn →


🎓 Learning & Certifications


🔁 My Journey So Far

📍 Where I Started

I began my career as a Computer Vision developer — building tools that combined low-level image processing with product intuition.

Projects included:

  • 🥔 Size & color–based potato sorting system — image processing algorithm deployed via Google Cloud Functions
  • 🧪 Custom designed CNN trained from scratch on a local machine for spliced image forgery detection (600+ epochs) training loss & training accuracy
  • 👁️ Early glaucoma detection prototype — built on Raspberry Pi with OpenCV + VR headset integration
  • 🚗 Real-time vehicle flow analysis — 24-hour video inference across lanes on AWS servers using YOLO
  • 🧰 Internal OpenCV tool replication — led a team replicating a core analytics tool for reuse
  • 🧑‍💻 Full UI/UX design for embedded systems — owned v1 + v2 flow for industrial machine vision tool

🔄 Where I Am Now

From the start, I've owned not just features — but the full flow: probleminterfacemodeldeployment. That mindset now drives my transition into:

  • LLM prototyping
  • Offline AI tooling
  • End-to-end product thinking

What began as an offline learning constraint turned out to be a blessing — forcing me to focus on privacy, full ownership, and infinite iteration where imagination was the only limit (and system RAM the only bottleneck).

That journey led to zero-cost, local-first tools that work for solo devs, startups, and eventually even cost-sensitive enterprises.

It’s no longer just about building features — I’m evolving into a product manager who takes full ownership, end to end.


🧪 From a young boy who believed that — unlike most things in life — code usually does exactly what you want...
to early repos here that might not mean much to others,
but marked real milestones for me.
And soon: tools that I hope will matter — not just to me, but to many of us building with constraints, creativity, and purpose.

Pinned Loading

  1. LLM-Power-Search-for-Open-Source-Licensing-Navigator LLM-Power-Search-for-Open-Source-Licensing-Navigator Public

    Python

  2. santhoshnumberone santhoshnumberone Public

  3. llama-mistral-macbook-local llama-mistral-macbook-local Public

  4. llama-cpp-python-benchmark llama-cpp-python-benchmark Public

    Python

  5. llm-benchmarks-mac llm-benchmarks-mac Public

    Python