-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Stale Connections Suddenly Increase when there is a Spike on Application #3359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@akshayk-ktk based on your configuration, I think the reason for the increased number of stale connections should be an error either durring getting / initing the connection or when putting it back in the pool. Would you be able to check if there is anything reported in the logs? If not, would you be able to check what is the go-redis/internal/pool/pool.go Line 407 in 9762559
|
@ndyakov We are using zerolog for logging and use the below custom struct to set it to the internal logger. We are not able to see any internal error logs here.
Let me try adding logs in the Remove method to check what are the error values. |
Hey @ndyakov after adding logs found some errors. Looks like I/O timeout errors. Does this mean the EC2 network interface is not able to handle the load? Our EC2 machines are AWS c6g.xlarge. Elasticache Redis - cache.m6g.large |
@akshayk-ktk I cannot comment on the elasticache setup and on the AWS setup. You can try to play around with multiple shards to see if this improves with the database's horizontal scaling. As for now, doesn't look like client issue, but will let you decide if you are gonna try anything further now, or we should close this issue. |
Hey @ndyakov we are upgrading the redis cluster node type and ec2 instance node types one at a time so we can monitor if we still get IO erorrs and stale connection increase. I would prefer to keep the issue open with an under observation tag if possible. Will report back any observations we get here. |
We are noticing Redis commands taking more than 2 seconds when there is a spike on the application.
I am printing the pool stats in the application and I see that many connections became stale as soon as the spike comes.
What can be done to mitigate this?
Expected Behavior
Redis Commands should complete in expected time.
Current Behavior
Redis Commands take more than 2 seconds to complete the commands when there is a Spike.
Redis Client Configuration
poolSize: 5000
connMaxIdleTime: -1s
dialTimeout: 10s
poolTimeout: 10s
readTimeout: 10s
minIdleConns: 3500
maxIdleConns: 3800
connMaxLifetime: -1s
Redis Server
We are using elasticache redis version 7+ in cluster mode. Currently with single shard.
Application is on EC2 running on 4 instances.
Below are the screenshots
The text was updated successfully, but these errors were encountered: