Skip to content

A Rust-based local AI inference gateway that enables offline execution of open-source models like LLaMA and Whisper. Provides HTTP API and CLI interfaces with a unified adapter system for seamless integration of different AI engines.

Notifications You must be signed in to change notification settings

otobongfp/Local-infer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local-Infer

Local-Infer is a Rust-based local inference gateway that lets you run open-source AI models completely offline. It provides a simple API and CLI for interacting with models such as LLaMA and Whisper without relying on cloud services.

Project Goal

To create a unified local backend for text, speech inference etc that is lightweight, modular, and privacy-preserving. To also expose adapters that allows ease in plugging in opensource models and have a ready project.

Core Features

  • Common trait interface for model engines
  • HTTP API for inference and transcription
  • CLI for running local tasks
  • Adapter system for engines like llama.cpp and whisper.cpp
  • Optional SQLite persistence for model registry and job history
  • Async runtime with Axum and Tokio
  • Extensible architecture for adding new adapters

Roadmap

  1. Core + API workspace setup
  2. Engine trait definition and LLaMA adapter
  3. Basic inference endpoint
  4. Persistent storage integration
  5. CLI tool
  6. Streaming support
  7. Additional adapters in the future (Whisper, OCR, etc.)

License

MIT License © 2025

About

A Rust-based local AI inference gateway that enables offline execution of open-source models like LLaMA and Whisper. Provides HTTP API and CLI interfaces with a unified adapter system for seamless integration of different AI engines.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages