Skip to content

Demonstration of scalable AI chat application #66

@tkubica12

Description

@tkubica12
  • Architecture:
    Stateless front-end receives chat requests, streams responses via SSE, and offloads processing to a message queue. Back-end workers fetch conversation history, call the LLM API in streaming mode, and relay tokens back through the queue for real-time delivery.

  • Flow:

    1. Client sends question (with sessionId, chatMessageId) to front-end.
    2. Front-end enqueues request to message queue.
    3. Worker dequeues, fetches history, calls LLM API (streaming).
    4. Worker streams tokens back to queue.
    5. Front-end streams tokens to client via SSE.
  • Scalability:
    Stateless, horizontally scalable front-end and workers. Queue buffers load and ensures ordered, session-based processing. Supports thousands of concurrent SSE streams.

  • Reliability:
    Independent failure domains, at-least-once delivery, session-based FIFO, auto-scaling, and high availability. Handles partial failures with retries and monitoring.

  • Security:
    Authenticated access, encrypted transport and storage, least-privilege permissions, input validation, API key protection, and content filtering.

  • Monitoring:
    End-to-end tracing, metrics (connections, queue, latency), centralized logging, dashboards, and automated alerts.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions