Description
Hi everyone,
I'm seeking some guidance on how to best manage connection spikes in my WebSocket server application, which heavily relies on Redis PubSub, Sorted Sets, and Lists.
Problem Description:
Our application experiences intermittent but significant spikes in user activity throughout the day. These spikes lead to a large number of concurrent connection initializations in our Redis connection pool. This, in turn, results in:
context deadline exceeded errors.
Slow Redis command execution times.
Current Mitigation Strategy:
I've attempted to address this by increasing the MinIdle setting in our Redis connection pool. Previously, with a lower MinIdle (around 1000), we encountered frequent connection-related errors during peak usage. By increasing MinIdle to 3000, the occurrence of these errors has significantly reduced (down to under 100).
Setup Details:
Redis Server: AWS Elasticache Redis (version 7+) with cluster mode enabled.
Application Instances: 4 EC2 instances running the WebSocket server application.
Redis Connection Pool Configuration (per application instance):
PoolSize - 5000
MinIdle - 3000
MaxIdle - 3500
connIdleTimeout - 9h
connLifetime - 12h
readTimeout - 10s
writeTimeout - 10s
Question:
While increasing MinIdle has provided some relief, I'm wondering if this is the most efficient or recommended approach. Are there alternative strategies or configurations I should consider to better handle these connection spikes and ensure the stability and performance of our application? Any insights or suggestions would be greatly appreciated!