Skip to content

Implement custom libevent adapter to fix timeout issues with event prioritization #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

plainbanana
Copy link
Owner

@plainbanana plainbanana commented Jun 13, 2025

Summary

This PR addresses timeout-related issues in Redis::Cluster::Fast by implementing a custom libevent adapter with event
prioritization.

Key Changes

  • Custom libevent adapter: Added src/adapters/libevent.h to handle timeout issues specific to our use case
  • Non-blocking event loop: Made run_event_loop truly non-blocking with EVLOOP_NONBLOCK
  • Enhanced error handling: Added proper error checks for event_base_priority_init
  • Documentation improvements: Fixed typos and clarified method descriptions
  • Code quality: Added explanatory comments and improved test clarity

Problem Background

When using hiredis-cluster's async context, we encountered situations where timeout events would fire even when Redis
responses were already available to read. This particularly affected scenarios where event loop execution was delayed due
to system load.

Approach Taken

After researching the issue and consulting with the hiredis community hiredis issue
#1286
, we learned that:

  1. libevent's event processing order: The library processes timeouts before I/O events as part of its design
  2. Upstream perspective: The hiredis maintainers appropriately noted that adding event prioritization to the
    general-purpose adapter could introduce starvation risks in multi-client scenarios
  3. Valid concern: In shared event base environments, high-traffic clients could potentially starve failing clients
    from receiving necessary timeouts

Our Solution

Given Redis::Cluster::Fast's specific usage patterns and requirements, we decided to maintain our own libevent adapter:

Why this works for our use case

  • Isolated event base: Redis::Cluster::Fast typically manages dedicated event bases
  • Controlled environment: Our specific client patterns reduce starvation risks
  • User benefit: Processing available responses before timeouts significantly improves user experience

Implementation Details

Our custom adapter (src/adapters/libevent.h) implements:

  • Separate event management: Independent handling of I/O and timer events
  • Event prioritization: I/O events (priority 0) before timer events (priority 1)
  • Deterministic behavior: Ensures available responses are processed before timeout evaluation
  // Priority 0: I/O events (highest priority) - Process Redis responses immediately
  // Priority 1: Timer events (lower priority) - Handle timeouts after I/O processing
  #define EVENT_BASE_PRIORITY_NUMBER 2

Test Plan

  • All existing tests pass
  • Memory leak tests (xt/02_leak.t, xt/08_leak_srandom.t)
  • Timeout handling tests (xt/04_timeout.t, xt/10_timeout_srandom.t)
  • Valgrind tests (xt/05_valgrind.t, xt/09_valgrind_srandom.t)
  • Fork safety tests (xt/03_fork.t, xt/11_fork_srandom.t)
  • Pipeline functionality verification

This change improves reliability for Redis::Cluster::Fast users while respecting the design decisions of the upstream
hiredis project.

…c operations. It introduces ev_io and ev_timer to handle timeout as intended.

After issuing run_event_loop and after the command_timeout has passed, when executing the event loop again to read the response, there is a problem that always results in a timeout. Now, fixed it and when there are any readable responses, it reads them.
Refactored the `run_event_loop` method to exit immediately when no events are available, ensuring it operates in non-blocking mode.
Removed the EVLOOP_ONCE flag from the event loop call as it is unnecessary when used with EVLOOP_NONBLOCK. This change simplifies the code and ensures clearer handling of the event loop behavior.
Replaced a duplicate `run_event_loop` call with `wait_one_response` for better clarity and correctness. This ensures the desired response is awaited effectively within the test.
Streamlined documentation for `run_event_loop` by removing redundant details about event processing and internal mechanics. Focused on clarifying return values, command handling, and timeout behavior for better readability.
Check the return value of event_base_priority_init and return an error
if the initialization fails. This improves error handling during the
connection setup phase.
Added detailed comments to explain the priority system for libevent:
- Priority 0 (highest): I/O events for immediate Redis response processing
- Priority 1 (lower): Timer events for timeout handling after I/O processing

This clarifies why I/O events are prioritized over timer events in the
event processing loop.
Corrected the spelling of "unexcuted" to "unexecuted" in method documentation. This improves clarity and accuracy in the code comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant