Demonstration of scalable AI chat application

- **Architecture:**  
Stateless front-end receives chat requests, streams responses via SSE, and offloads processing to a message queue. Back-end workers fetch conversation history, call the LLM API in streaming mode, and relay tokens back through the queue for real-time delivery.
- **Flow:**

    1. Client sends question (with sessionId, chatMessageId) to front-end.
    2. Front-end enqueues request to message queue.
    3. Worker dequeues, fetches history, calls LLM API (streaming).
    4. Worker streams tokens back to queue.
    5. Front-end streams tokens to client via SSE.
- **Scalability:**  
Stateless, horizontally scalable front-end and workers. Queue buffers load and ensures ordered, session-based processing. Supports thousands of concurrent SSE streams.
- **Reliability:**  
Independent failure domains, at-least-once delivery, session-based FIFO, auto-scaling, and high availability. Handles partial failures with retries and monitoring.
- **Security:**  
Authenticated access, encrypted transport and storage, least-privilege permissions, input validation, API key protection, and content filtering.
- **Monitoring:**  
End-to-end tracing, metrics (connections, queue, latency), centralized logging, dashboards, and automated alerts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Demonstration of scalable AI chat application #66

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Demonstration of scalable AI chat application #66

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions