Local-Infer is a Rust-based local inference gateway that lets you run open-source AI models completely offline. It provides a simple API and CLI for interacting with models such as LLaMA and Whisper without relying on cloud services.
To create a unified local backend for text, speech inference etc that is lightweight, modular, and privacy-preserving. To also expose adapters that allows ease in plugging in opensource models and have a ready project.
- Common trait interface for model engines
- HTTP API for inference and transcription
- CLI for running local tasks
- Adapter system for engines like llama.cppandwhisper.cpp
- Optional SQLite persistence for model registry and job history
- Async runtime with Axum and Tokio
- Extensible architecture for adding new adapters
- Core + API workspace setup
- Engine trait definition and LLaMA adapter
- Basic inference endpoint
- Persistent storage integration
- CLI tool
- Streaming support
- Additional adapters in the future (Whisper, OCR, etc.)
MIT License © 2025