A simple Streamlit application that enables persistent conversations with local AI models through Ollama. Your AI remembers previous conversations across sessions using semantic memory.
- 🧠 Persistent Memory: Conversations are stored and recalled across sessions
- 🔍 Semantic Search: Retrieves relevant past conversations based on context
- 🤖 Multi-Model Support: Works with any locally installed Ollama model
- ⚙️ Configurable: Adjust memory depth, relevance thresholds, and context length
- 💾 Local Storage: All data stays on your machine using ChromaDB
-
Install Ollama (if not already installed):
# On macOS/Linux curl -fsSL https://ollama.ai/install.sh | sh # On Windows - download from https://ollama.ai
-
Install at least one model:
ollama pull gemma3:12b # or any other model you prefer ollama pull llama3.2:3b
-
Start Ollama server:
ollama serve
-
Clone this repository:
git clone https://github.com/mebrown47/stateful-ai-chat cd stateful-ai-chat
-
Install Python dependencies:
pip install -r requirements.txt
-
Run the application:
streamlit run stateful_ai_chat.py
-
Open your browser to
http://localhost:8501
- Select your preferred model from the dropdown
- Start chatting - the AI will respond using the selected model
- Your conversation history is automatically saved
- Enable Memory: Toggle to use previous conversations as context
- Memory Depth: Number of relevant past conversations to include (1-20)
- Relevance Threshold: Minimum similarity score for memory retrieval (0.1-0.9)
- Max Context Length: Maximum characters for memory context (300-2000)
- The dropdown automatically shows all your installed Ollama models
- Switch models anytime - each model has its own memory database
- If connection fails, you can manually enter a model name
- Memory Depth: How many relevant memories to retrieve (default: 5)
- Relevance Threshold: How similar memories need to be (default: 0.3)
- Context Length: Maximum memory text to include (default: 1000 chars)
Use the "🗑️ Clear Memory" button in the sidebar to reset conversation history for the current model.
├── stateful_ai_chat.py # Main application
├── requirements.txt # Python dependencies
├── README.md # This file
└── memory_<model_name>/ # Auto-created memory databases
- Ensure Ollama is running:
ollama serve
- Check if accessible:
curl http://localhost:11434/api/tags
- Verify models are installed:
ollama list
- Install a model:
ollama pull <model-name>
- Popular options:
gemma3:12b
,llama3.2:3b
,qwen2.5-coder:7b
- ChromaDB databases are created automatically in
./memory_<model>/
- To reset completely, delete these folders
- Check disk space if memory operations fail
- Smaller models respond faster but may be less capable
- Reduce memory depth for faster responses
- Increase relevance threshold to get more focused memories
- Fork the repository
- Create a feature branch:
git checkout -b new-feature
- Commit changes:
git commit -am 'Add new feature'
- Push to branch:
git push origin new-feature
- Submit a pull request
This project is open source. Feel free to use, modify, and distribute.