🇬🇧 English | 🇪🇸 Español | 🇫🇷 Français | 🇨🇳 中文 | 🇯🇵 日本語 | 🇮🇳 हिंदी | 🇩🇪 Deutsch
Cross-platform framework for deploying LLM/VLM/TTS models locally in your app.
- Available in Flutter, React-Native and Kotlin Multiplatform.
- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek etc.
- Run LLMs, VLMs, Embedding Models, TTS models and more.
- Accommodates from FP32 to as low as 2-bit quantized models, for efficiency and less device strain.
- Chat templates with Jinja2 support and token streaming.
CLICK TO JOIN OUR DISCORD!
CLICK TO VISUALISE AND QUERY REPO
- Install:
Execute the following command in your project terminal:
flutter pub add cactus
- Flutter Text Completion
import 'package:cactus/cactus.dart'; final lm = await CactusLM.init( modelUrl: 'https://huggingface.co/Cactus-Compute/Qwen3-600m-Instruct-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf', contextSize: 2048, ); final messages = [ChatMessage(role: 'user', content: 'Hello!')]; final response = await lm.completion(messages, maxTokens: 100, temperature: 0.7);
- Flutter Embedding
import 'package:cactus/cactus.dart'; final lm = await CactusLM.init( modelUrl: 'https://huggingface.co/Cactus-Compute/Qwen3-600m-Instruct-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf', contextSize: 2048, generateEmbeddings: True, ); final text = 'Your text to embed'; final result = await lm.embedding(text);
- Flutter VLM Completion
import 'package:cactus/cactus.dart'; final vlm = await CactusVLM.init( modelUrl: 'https://huggingface.co/Cactus-Compute/SmolVLM2-500m-Instruct-GGUF/resolve/main/SmolVLM2-500M-Video-Instruct-Q8_0.gguf', mmprojUrl: 'https://huggingface.co/Cactus-Compute/SmolVLM2-500m-Instruct-GGUF/resolve/main/mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf', ); final messages = [ChatMessage(role: 'user', content: 'Describe this image')]; final response = await vlm.completion( messages, imagePaths: ['/absolute/path/to/image.jpg'], maxTokens: 200, temperature: 0.3, );
- Flutter Cloud Fallback
import 'package:cactus/cactus.dart'; final lm = await CactusLM.init( modelUrl: 'https://huggingface.co/Cactus-Compute/Qwen3-600m-Instruct-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf', contextSize: 2048, cactusToken: 'enterprise_token_here', ); final messages = [ChatMessage(role: 'user', content: 'Hello!')]; final response = await lm.completion(messages, maxTokens: 100, temperature: 0.7); // local (default): strictly only run on-device // localfirst: fallback to cloud if device fails // remotefirst: primarily remote, run local if API fails // remote: strictly run on cloud final embedding = await lm.embedding('Your text', mode: 'localfirst');
N/B: See the Flutter Docs for more.
-
Install the
cactus-react-native
package:npm install cactus-react-native && npx pod-install
-
React-Native Text Completion
import { CactusLM } from 'cactus-react-native'; const { lm, error } = await CactusLM.init({ model: '/path/to/model.gguf', // this is a local model file inside the app sandbox n_ctx: 2048, }); const messages = [{ role: 'user', content: 'Hello!' }]; const params = { n_predict: 100, temperature: 0.7 }; const response = await lm.completion(messages, params);
-
React-Native Embedding
import { CactusLM } from 'cactus-react-native'; const { lm, error } = await CactusLM.init({ model: '/path/to/model.gguf', // local model file inside the app sandbox n_ctx: 2048, embedding: True, }); const text = 'Your text to embed'; const params = { normalize: True }; const result = await lm.embedding(text, params);
-
React-Native VLM
import { CactusVLM } from 'cactus-react-native'; const { vlm, error } = await CactusVLM.init({ model: '/path/to/vision-model.gguf', // local model file inside the app sandbox mmproj: '/path/to/mmproj.gguf', // local model file inside the app sandbox }); const messages = [{ role: 'user', content: 'Describe this image' }]; const params = { images: ['/absolute/path/to/image.jpg'], n_predict: 200, temperature: 0.3, }; const response = await vlm.completion(messages, params);
-
React-Native Agents
import { CactusAgent } from 'cactus-react-native'; // we recommend Qwen 3 family, 0.6B is great const { agent, error } = await CactusAgent.init({ model: '/path/to/model.gguf', n_ctx: 2048, }); const weatherTool = agent.addTool( (location: string) => `Weather in ${location}: 72°F, sunny`, 'Get current weather for a location', { location: { type: 'string', description: 'City name', required: true } } ); const messages = [{ role: 'user', content: 'What\'s the weather in NYC?' }]; const result = await agent.completionWithTools(messages, { n_predict: 200, temperature: 0.7, }); await agent.release();
Get started with an example app built using CactusAgent
.
See the React Docs for more.
-
Add Maven Dependency: Add to your KMP project's
build.gradle.kts
:kotlin { sourceSets { commonMain { dependencies { implementation("com.cactus:library:0.2.4") } } } }
-
Platform Setup:
- Android: Works automatically - native libraries included.
- iOS: In Xcode: File → Add Package Dependencies → Paste
https://github.com/cactus-compute/cactus
→ Click Add
-
Kotlin Multiplatform Text Completion
import com.cactus.CactusLM import kotlinx.coroutines.runBlocking runBlocking { val lm = CactusLM( threads = 4, contextSize = 2048, gpuLayers = 0 // Set to 99 for full GPU offload ) val downloadSuccess = lm.download( url = "path/to/hugginface/gguf", filename = "model_filename.gguf" ) val initSuccess = lm.init("qwen3-600m.gguf") val result = lm.completion( prompt = "Hello!", maxTokens = 100, temperature = 0.7f ) }
-
Kotlin Multiplatform Speech To Text
import com.cactus.CactusSTT import kotlinx.coroutines.runBlocking runBlocking { val stt = CactusSTT( language = "en-US", sampleRate = 16000, maxDuration = 30 ) // Only supports default Vosk STT model for Android & Apple FOundation Model val downloadSuccess = stt.download() val initSuccess = stt.init() val result = stt.transcribe() result?.let { sttResult -> println("Transcribed: ${sttResult.text}") println("Confidence: ${sttResult.confidence}") } // Or transcribe from audio file val fileResult = stt.transcribeFile("/path/to/audio.wav") }
-
Kotlin Multiplatform VLM
import com.cactus.CactusVLM import kotlinx.coroutines.runBlocking runBlocking { val vlm = CactusVLM( threads = 4, contextSize = 2048, gpuLayers = 0 // Set to 99 for full GPU offload ) val downloadSuccess = vlm.download( modelUrl = "path/to/hugginface/gguf", mmprojUrl = "path/to/hugginface/mmproj/gguf", modelFilename = "model_filename.gguf", mmprojFilename = "mmproj_filename.gguf" ) val initSuccess = vlm.init("smolvlm2-500m.gguf", "mmproj-smolvlm2-500m.gguf") val result = vlm.completion( prompt = "Describe this image", imagePath = "/path/to/image.jpg", maxTokens = 200, temperature = 0.3f ) }
N/B: See the Kotlin Docs for more.
Cactus backend is written in C/C++ and can run directly on phones, smart tvs, watches, speakers, cameras, laptops etc. See the C++ Docs for more.
First, clone the repo with git clone https://github.com/cactus-compute/cactus.git
, cd into it and make all scripts executable with chmod +x scripts/*.sh
-
Flutter
- Build the Android JNILibs with
scripts/build-flutter-android.sh
. - Build the Flutter Plugin with
scripts/build-flutter.sh
. (MUST run before using example) - Navigate to the example app with
cd flutter/example
. - Open your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with this combo
flutter clean && flutter pub get && flutter run
. - Play with the app, and make changes either to the example app or plugin as desired.
- Build the Android JNILibs with
-
React Native
- Build the Android JNILibs with
scripts/build-react-android.sh
. - Build the Flutter Plugin with
scripts/build-react.sh
. - Navigate to the example app with
cd react/example
. - Setup your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with this combo
yarn && yarn ios
oryarn && yarn android
. - Play with the app, and make changes either to the example app or package as desired.
- For now, if changes are made in the package, you would manually copy the files/folders into the
examples/react/node_modules/cactus-react-native
.
- Build the Android JNILibs with
-
Kotlin Multiplatform
- Build the Android JNILibs with
scripts/build-flutter-android.sh
. (Flutter & Kotlin share same JNILibs) - Build the Kotlin library with
scripts/build-kotlin.sh
. (MUST run before using example) - Navigate to the example app with
cd kotlin/example
. - Open your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with
./gradlew :composeApp:run
for desktop or use Android Studio/Xcode for mobile. - Play with the app, and make changes either to the example app or library as desired.
- Build the Android JNILibs with
-
C/C++
- Navigate to the example app with
cd cactus/example
. - There are multiple main files
main_vlm, main_llm, main_embed, main_tts
. - Build both the libraries and executable using
build.sh
. - Run with one of the executables
./cactus_vlm
,./cactus_llm
,./cactus_embed
,./cactus_tts
. - Try different models and make changes as desired.
- Navigate to the example app with
-
Contributing
- To contribute a bug fix, create a branch after making your changes with
git checkout -b <branch-name>
and submit a PR. - To contribute a feature, please raise as issue first so it can be discussed, to avoid intersecting with someone else.
- Join our discord
- To contribute a bug fix, create a branch after making your changes with
Device | Gemma3 1B Q4 (toks/sec) | Qwen3 4B Q4 (toks/sec) |
---|---|---|
iPhone 16 Pro Max | 54 | 18 |
iPhone 16 Pro | 54 | 18 |
iPhone 16 | 49 | 16 |
iPhone 15 Pro Max | 45 | 15 |
iPhone 15 Pro | 45 | 15 |
iPhone 14 Pro Max | 44 | 14 |
OnePlus 13 5G | 43 | 14 |
Samsung Galaxy S24 Ultra | 42 | 14 |
iPhone 15 | 42 | 14 |
OnePlus Open | 38 | 13 |
Samsung Galaxy S23 5G | 37 | 12 |
Samsung Galaxy S24 | 36 | 12 |
iPhone 13 Pro | 35 | 11 |
OnePlus 12 | 35 | 11 |
Galaxy S25 Ultra | 29 | 9 |
OnePlus 11 | 26 | 8 |
iPhone 13 mini | 25 | 8 |
Redmi K70 Ultra | 24 | 8 |
Xiaomi 13 | 24 | 8 |
Samsung Galaxy S24+ | 22 | 7 |
Samsung Galaxy Z Fold 4 | 22 | 7 |
Xiaomi Poco F6 5G | 22 | 6 |
![]() |
---|
![]() |
![]() |
---|
We provide a colleaction of recommended models on our HuggingFace Page