Skip to content

Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat — powered by ONNX Runtime.

License

Notifications You must be signed in to change notification settings

dineshsoudagar/local-llms-on-android

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Local LLMs on Android (Offline, Private & Fast)

An Android application that brings a large language model (LLM) to your phone — fully offline, no internet needed. Powered by ONNX Runtime and a Hugging Face-compatible tokenizer, it provides fast, private, on-device question answering with streaming responses.


✨ Features

  • 📱 Fully on-device LLM inference with ONNX Runtime
  • 🔤 Hugging Face-compatible BPE tokenizer (tokenizer.json)
  • 🧠 Qwen2.5 & Qwen3 prompt formatting with streaming generation
  • 🧩 Custom ModelConfig for precision, prompt style, and KV cache
  • 🧘‍♂️ Thinking Mode toggle (enabled in Qwen3) for step-by-step reasoning
  • 🚀 Coroutine-based UI for smooth user experience
  • 🔐 Runs 100% offline — no network, no telemetry

📸 Inference Preview

Model Output 2 Input Prompt Input Prompt

Figure: App interface showing prompt input and generated answers using the local LLM.


🧠 Model Info

This app supports both Qwen2.5-0.5B-Instruct and Qwen3-0.6B — optimized for instruction-following, QA, and reasoning tasks.

🔁 Option 1: Use Preconverted ONNX Model

Download the model.onnx and tokenizer.json from Hugging Face:

⚙️ Option 2: Convert Model Yourself

pip install optimum[onnxruntime]
# or
python -m pip install git+https://github.com/huggingface/optimum.git

Export the model:

optimum-cli export onnx --model Qwen/Qwen2.5-0.5B-Instruct qwen2.5-0.5B-onnx/
  • You can also convert any fine-tuned variant by specifying the model path.
  • Learn more about Optimum here.

⚙️ Requirements


📲 How to Build & Run

  1. Open Android Studio and create a new project (Empty Activity).
  2. Name your app local_llm.
  3. Copy all the project files from this repo into the appropriate folders.
  4. Place your model.onnx and tokenizer.json in:
    app/src/main/assets/
    
  5. Connect your Android phone using wireless debugging or USB.
  6. To install:
    • Press Run ▶️ in Android Studio, or
    • Go to Build → Generate Signed Bundle / APK to export the .apk file.

📦 Download Prebuilt APKs

🔐 Privacy First

This app performs all inference locally on your device. No data is sent to any server, ensuring full privacy and low latency.


🔮 Roadmap

  • 🧠 Qwen3-0.6B — Added Qwen3 model support.
  • 🔁 Chat Memory — Add multi-turn conversation with context retention.
  • 🦙 LLaMA 3 1B — Support Meta’s new compact LLM.

📄 License

MIT License — use freely, modify locally, and deploy offline. Contributions welcome!

About

Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat — powered by ONNX Runtime.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published