Skip to content

Handling Connection Spikes in WebSocket Server with Redis #3356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akshaykhairmode opened this issue Apr 21, 2025 · 3 comments
Closed

Handling Connection Spikes in WebSocket Server with Redis #3356

akshaykhairmode opened this issue Apr 21, 2025 · 3 comments

Comments

@akshaykhairmode
Copy link

akshaykhairmode commented Apr 21, 2025

Hi everyone,

I'm seeking some guidance on how to best manage connection spikes in my WebSocket server application, which heavily relies on Redis PubSub, Sorted Sets, and Lists.

Problem Description:

Our application experiences intermittent but significant spikes in user activity throughout the day. These spikes lead to a large number of concurrent connection initializations in our Redis connection pool. This, in turn, results in:

context deadline exceeded errors.
Slow Redis command execution times.

Current Mitigation Strategy:

I've attempted to address this by increasing the MinIdle setting in our Redis connection pool. Previously, with a lower MinIdle (around 1000), we encountered frequent connection-related errors during peak usage. By increasing MinIdle to 3000, the occurrence of these errors has significantly reduced (down to under 100).

Setup Details:

Redis Server: AWS Elasticache Redis (version 7+) with cluster mode enabled.
Application Instances: 4 EC2 instances running the WebSocket server application.
Redis Connection Pool Configuration (per application instance):

PoolSize - 5000
MinIdle - 3000
MaxIdle - 3500
connIdleTimeout - 9h
connLifetime - 12h
readTimeout - 10s
writeTimeout - 10s

Question:

While increasing MinIdle has provided some relief, I'm wondering if this is the most efficient or recommended approach. Are there alternative strategies or configurations I should consider to better handle these connection spikes and ensure the stability and performance of our application? Any insights or suggestions would be greatly appreciated!

@ndyakov
Copy link
Member

ndyakov commented Apr 29, 2025

Hello @akshaykhairmode, Can you share more about your setup? What is the context you are passing to the clients? Why are the context timing out?

@akshaykhairmode
Copy link
Author

akshaykhairmode commented Apr 29, 2025

Hi @ndyakov , the context passed is context.Background.

I have increased the pool size to minIdle as 3500 and maxIdle as 3800 and changed the maxIdleTime and maxLifetime to -1 to disable those checks.

Now I do not see context deadline exceeded but slowness is there which I have reported in #3359

@ndyakov
Copy link
Member

ndyakov commented May 1, 2025

Continued discussion in the issue from the comment above. Closing this one.

@ndyakov ndyakov closed this as completed May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants