Skip to content

Conversation

@justinpersaud
Copy link
Contributor

@justinpersaud justinpersaud commented Oct 9, 2025

Explanation of Change

When a connection to bedrock takes a little bit longer, wait for it to be ready by setting the socket to non blocking mode on startup and then blocking before actually using it.

Related Issues

https://github.com/Expensify/Expensify/issues/544983

Deployment

  • I followed the steps in the README to ensure this PR is deployed properly

- Add isSocketHealthy() method that checks if a socket is still usable by peeking at the socket state
- Detects when remote end has closed the connection
- Detects other socket errors that would cause issues on reuse
- Modify sendRawRequest() to validate socket health before attempting reuse
- Automatically closes stale sockets and opens new connections
- This should significantly reduce 'Error 11: Resource temporarily unavailable' errors
- Track socket creation time, last used time, and request count
- Log socket age (seconds since creation) when reusing or detecting stale sockets
- Log socket idle time (seconds since last use) for better debugging
- Log number of requests made on each socket connection
- Reset metrics when socket is closed (on failure, Connection: close, or stale detection)
- Provides visibility into socket lifecycle and helps identify patterns in EAGAIN errors
- Set socket to non-blocking mode immediately after creation for proper PHP 8 Socket object behavior
- Add proper socket_select() wait after EINPROGRESS to ensure socket is ready for writing
- Increase connection timeout from 1s to 5s for remote datacenter connections
- Add detailed timing logs to track connection establishment performance
- Return socket to blocking mode after connection established for normal I/O operations

This addresses intermittent 'Resource temporarily unavailable' errors that occur
when connecting to remote Bedrock servers, particularly in PHP 8 environments
where Socket objects have different timing behavior than legacy resources
@justinpersaud justinpersaud changed the title Jpersaud socket not ready Wait for socket to be ready before connecting on 115 Oct 10, 2025
@justinpersaud justinpersaud marked this pull request as ready for review October 10, 2025 12:57
@justinpersaud justinpersaud requested a review from flodnv October 10, 2025 12:59
src/Client.php Outdated
}

// Configure this socket and try to connect to it
// PHP 8 Socket objects need explicit non-blocking mode to prevent EINPROGRESS timeouts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment confuses me very much (like this problem as a whole) 😅

  • Isn't our intention to use blocking sockets?
  • The next confused engineer will read "PHP 8 Socket objects need explicit non-blocking mode to prevent EINPROGRESS timeouts" and think: so why are we handling EINPROGRESS below?

src/Client.php Outdated
$this->logger->info('Bedrock\Client - socket_connect returned error 115, continuing.');
$this->logger->info('Bedrock\Client - socket_connect returned error 115, waiting for connection to complete.', [
'host' => $host,
'connect_attempt_time_ms' => round($connectTime, 3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to lowerCamelCase

src/Client.php Outdated
$this->logger->info('Bedrock\Client - socket_connect returned error 115, waiting for connection to complete.', [
'host' => $host,
'connect_attempt_time_ms' => round($connectTime, 3),
'pid' => getmypid(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we logging pid here? 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally looking to see if this could be tied to php-fpm workers shutting down after 1000 requests, but that did not pan out. I will remove it.

src/Client.php Outdated
throw new ConnectionFailure("Socket had error after EINPROGRESS for $host:$port. Error: $socketErrorCode $socketError");
}

$selectTime = (microtime(true) - $connectStart) * 1000; // Total time from connect to ready
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect selectTime to time socket_select, not socket_select & socket_connect

src/Client.php Outdated

$this->logger->info('Bedrock\Client - Socket ready for writing after EINPROGRESS.', [
'host' => $host,
'total_connection_time_ms' => round($selectTime, 3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and below, lowerCameCase please

src/Client.php Outdated
'host' => $host,
'total_connection_time_ms' => round($selectTime, 3),
'select_wait_time_ms' => round($selectTime - $connectTime, 3),
'pid' => getmypid(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

@justinpersaud
Copy link
Contributor Author

Updated

@justinpersaud justinpersaud requested a review from flodnv October 14, 2025 12:08
flodnv
flodnv previously approved these changes Oct 14, 2025
$except = [];
$selectStart = microtime(true);
$selectResult = socket_select($read, $write, $except, $this->connectionTimeout, $this->connectionTimeoutMicroseconds);
// Time for socket_select call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAB: Missing empty line before comment

@justinpersaud justinpersaud merged commit 02142e5 into main Oct 14, 2025
2 checks passed
@justinpersaud justinpersaud deleted the jpersaud_socket_not_Ready branch October 14, 2025 17:13
@justinpersaud justinpersaud restored the jpersaud_socket_not_Ready branch October 14, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants