- Overview
- Features
- Tech Stack
- Supported LLMs
- Quickstart
- Model Library
- Customize a Model
- Flow Chart Diagram
Powered by Ollama
A sleek, modern chat interface for interacting with a variety of Ollama's large language models covering a wide variety of use cases.
This application provides a ChatGPT-inspired dark theme UI with proper code syntax highlighting and markdown support.
- Clean, responsive web interface with dark theme
- Support for multiple Ollama code models
- VSCode-style syntax highlighting for code blocks
- Markdown rendering for rich text responses
- Easy-to-use chat interface
- Ollama install and run shell script
Located in:
use/runollam.sh
- Go
- Java Script
- HTML
- CSS
This application provides a sleek interface for interacting with various Ollama models. Follow these steps to get up and running:
If you haven't already installed Ollama, use our convenient script:
chmod +x install_ollama.sh
./install_ollama.sh
To pull a model, use the following command. For example, to pull Llama 3.1:
ollama pull llama3.1
Once the model is pulled, you can run it interactively on the command line:
ollama run llama3.1
You can also use the included shell script to automate the selection and pulling of Ollama models:
chmod +x install_ollama.sh
./install_ollama.sh
- Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below
The following information is from Ollama's GitHub and is relevant to its use in this web scraper
- Ollama supports a list of models available on ollama.com/library
- Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
Llama 3.1 | 70B | 40GB | ollama run llama3.1:70b |
Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
Phi 3 Mini | 3.8B | 2.3GB | ollama run phi3 |
Phi 3 Medium | 14B | 7.9GB | ollama run phi3:medium |
Gemma 2 | 2B | 1.6GB | ollama run gemma2:2b |
Gemma 2 | 9B | 5.5GB | ollama run gemma2 |
Gemma 2 | 27B | 16GB | ollama run gemma2:27b |
Mistral | 7B | 4.1GB | ollama run mistral |
Moondream 2 | 1.4B | 829MB | ollama run moondream |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Solar | 10.7B | 6.1GB | ollama run solar |
Note
You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Ollama supports importing GGUF models in the Modelfile:
-
Create a file named
Modelfile
, with aFROM
instruction with the local filepath to the model you want to import.FROM ./vicuna-33b.Q4_0.gguf
-
Create the model in Ollama
ollama create example -f Modelfile
-
Run the model
ollama run example
flowchart TD
A[Client Browser] -->|HTTP Request| B[Go Web Server]
subgraph "Backend (Go Server)"
B --> C{Request Type}
C -->|GET /| D[Serve index.html]
C -->|GET /api/models| E[Get Available Models]
C -->|POST /api/chat| F[Handle Chat Request]
C -->|GET /api/chat/ws| G[WebSocket Connection]
C -->|GET /api/health| H[Health Check]
E --> E1[Return AVAILABLE_MODELS list]
F --> F1[Parse ChatMessage]
F1 --> F2{Valid Model?}
F2 -->|Yes| F3[Update currentModel]
F2 -->|No| F4[Return Error]
F3 --> F5[Process Query with LangChain]
F5 --> F6[Return Response]
F4 --> F6
G --> G1[Establish WebSocket]
G1 --> G2[Listen for Messages]
G2 --> G3[Parse Message]
G3 --> G4{Valid Model?}
G4 -->|Yes| G5[Update currentModel]
G4 -->|No| G6[Send Error]
G5 --> G7[Process Query with LangChain]
G7 --> G8[Send Response via WebSocket]
G6 --> G8
H --> H1[Check if Ollama is Running]
H1 --> H2{Ollama Available?}
H2 -->|Yes| H3[Return Status OK]
H2 -->|No| H4[Return Status Error]
end
subgraph "Ollama LLM Processing"
F5 --> I[Format Prompt with Template]
G7 --> I
I --> J[Initialize Ollama LLM Client]
J --> K[Generate Response with LangChain]
K --> L[Return Formatted Response]
end
D --> A
E1 --> A
F6 --> A
G8 --> A
H3 --> A
H4 --> A
L --> F5
L --> G7
subgraph "Frontend (Browser)"
A --> M[Display UI]
M --> N[User Selects Model]
M --> O[User Enters Message]
O --> P{Connection Type}
P -->|HTTP| Q[Send POST Request]
P -->|WebSocket| R[Send WebSocket Message]
Q --> S[Display Response]
R --> S
end
-
Client Interaction:
- User accesses the application through a web browser
- The frontend loads the UI with a model selector and chat interface
- User can select a model from the dropdown menu and enter messages
-
Request Handling:
- The Go web server handles different types of requests:
- Serves static files (HTML, CSS, JS)
- Provides API endpoints for models, chat, and health checks
- Manages WebSocket connections for real-time chat
- The Go web server handles different types of requests:
-
Model Selection:
- Available models are defined in the
AVAILABLE_MODELS
array - The frontend displays these models in a dropdown
- When a user selects a model, it's sent with the chat message
- Available models are defined in the
-
Chat Processing:
- Messages can be sent via HTTP POST or WebSocket
- The server validates the requested model
- If valid, it updates the current model
- The message is processed using the Ollama LLM through LangChain
-
LLM Integration:
- The system formats the prompt using a template
- Initializes the Ollama LLM client with the selected model
- Generates a response using LangChain
- Returns the formatted response to the client
-
Response Handling:
- The response is sent back to the client
- For HTTP requests, it's returned as JSON
- For WebSocket connections, it's sent as a text message
- The frontend displays the response with proper markdown formatting and syntax highlighting
This flow chart illustrates the complete lifecycle of a user interaction with the Ollama AI Assistant application, from model selection to receiving AI-generated responses.