-
Notifications
You must be signed in to change notification settings - Fork 1k
Suboptimal P99 latency under pipelining workloads #4998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Addresses #4998. Removes unnecessary Yielding that humpers pipeline efficiency. Moreover, it also increases the socket buffer effectively allowing processing more requests in bulk. Finally, it changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Addresses #4998. 1. Removes unnecessary yielding when reading multiple requests since it humpers pipeline efficiency.\ 2. Increases socket read buffer size effectively allowing processing more requests in bulk. 3. Changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Regarding (1): we should still yield in the connnection fiber to let the AsyncFiber to unload requests - otherwise we will keep reading from the socket until all the data is read or we reached the pipelining limit. It is sub-optimal because of the lost opportunity to kick off the pipeline running in parallel to reading from the socket. There are two possible solutions:
|
Finally whatever we do - we should comment why we do it and even link this issue for more details. |
Addresses #4998. 1. Removes unnecessary yielding when reading multiple requests since it humpers pipeline efficiency.\ 2. Increases socket read buffer size effectively allowing processing more requests in bulk. 3. Changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Addresses #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point. 2. Increases socket read buffer size effectively allowing processing more requests in bulk. 3. Changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Addresses #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point. 2. Increases socket read buffer size effectively allowing processing more requests in bulk. 3. Changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Addresses #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point. 2. Increases socket read buffer size effectively allowing processing more requests in bulk. 3. Changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Addresses #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point. 2. Increases socket read buffer size effectively allowing processing more requests in bulk. 3. Changes the sharding function for cluster mode to shard by slot id. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Addresses #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. Before this PR: `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: p50: 1887, p75: 2367, p90: 2897, p99: 6266 After this PR: `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>
We have not really focused on combination of P99 latency and pipelining, especially in the context
of uncoordinated omission access patterns.
When investigating the elevated latency phenomena even under small CPU load I noticed the following:
mget {foo}aaa {bar}aaa
is allowed as long as both{foo}
and{bar}
belong to the same shard. Dragonfly processes such requests correctly but this means these commands run on multiple shards. Unfortunately pipelining optimizations do not work well with multi-shard commands and loose this efficiency (can be fixed in the future)The text was updated successfully, but these errors were encountered: