A web app that generates AI-powered image captions. Ideal for LoRA model training on platforms like fal LoRA Trainer and Replicate LoRA Trainer.
- Dual Model Support: OpenAI API (GPT-4.1 series) or Ollama (local models)
- Batch Processing: Upload and caption multiple images at once
- Customization: Add prefix/suffix to captions
- Export: Download all captions as a ZIP file
- API Key Management: Securely store OpenAI keys in-app
- OpenAI: GPT-4.1 (high-quality), GPT-4.1-mini (balanced), and GPT-4.1-nano (faster, cheaper)
- Ollama: Local vision models (LLaVA, moondream, bakLLaVA), no API key needed
Note: When using the deployed web app with Ollama, you have several options:
- Use ngrok to create a secure tunnel to your local Ollama server. Learn more.
- Configure Ollama to allow additional web origins using the
OLLAMA_ORIGINS
environment variable. Learn more and check out LobeHub's Ollama provider documentation.
Next.js 14, Tailwind CSS, shadcn/ui, Lucide React, Vercel AI SDK
- Node.js (v16+)
- Yarn
- OpenAI API key (if using OpenAI)
- Ollama installed locally (if using local models)
# Clone repo
git clone https://github.com/aleksa-codes/gpt-flux-img-captioner.git
cd gpt-image-captioner
# Install dependencies
yarn install
# Start development server
yarn dev
Open http://localhost:3000 in your browser.
- Choose between OpenAI or Ollama
- Upload one or more images
- Add optional prefix/suffix
- Generate captions
- Download as ZIP
- Install Ollama
- Pull a vision model:
ollama pull llava
- Start Ollama server
- Select "Ollama" in the app and choose your model
Contributions welcome! Fork the repo, create a feature branch, and submit a pull request.
MIT License - see the LICENSE file for details.
Made with ❤️ by aleksa.codes