-
Notifications
You must be signed in to change notification settings - Fork 12
Description
-
Architecture:
Stateless front-end receives chat requests, streams responses via SSE, and offloads processing to a message queue. Back-end workers fetch conversation history, call the LLM API in streaming mode, and relay tokens back through the queue for real-time delivery. -
Flow:
- Client sends question (with sessionId, chatMessageId) to front-end.
- Front-end enqueues request to message queue.
- Worker dequeues, fetches history, calls LLM API (streaming).
- Worker streams tokens back to queue.
- Front-end streams tokens to client via SSE.
-
Scalability:
Stateless, horizontally scalable front-end and workers. Queue buffers load and ensures ordered, session-based processing. Supports thousands of concurrent SSE streams. -
Reliability:
Independent failure domains, at-least-once delivery, session-based FIFO, auto-scaling, and high availability. Handles partial failures with retries and monitoring. -
Security:
Authenticated access, encrypted transport and storage, least-privilege permissions, input validation, API key protection, and content filtering. -
Monitoring:
End-to-end tracing, metrics (connections, queue, latency), centralized logging, dashboards, and automated alerts.