-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8291652: (ch) java/nio/channels/SocketChannel/VectorIO.java failed with "Exception: Server 15: Timed out" #26049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…th "Exception: Server 15: Timed out"
👋 Welcome back jpai! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
A couple of observations to consider. The setLength is a static member variable of the test effectively a global variable, but it has non synchronised access from multiple threads. The use of the CountDownLatch is as about the best we can do. It should mitigate against the possibilities of observed race conditions, but won’t absolutely guarantee this. Consider the following, slightly convoluted scenario: The Server starts and executes as far as the countDown on the connAcceptLatch, at which point the server thread gets bumped by the OS and is placed in RTR queue waiting its next scheduled time slice. Thus, it might be more prudent to close the socket on the client or initiator side (i.e. in the main test thread), after the Server has finished. As such after the sv.awaitFinish call. In this case the Server will have closed its end of the socket connection also, at that point in time. To accommodate this logic, pass the Server reference to the bufferTest method to invoke the sv.awaitFinish, Another alternative for this is a refactor extract method at line 92 SocketChannel openConnection(in port) throws Exception { Then after awaitFinish call a sc.close in the main In any case, back to the main point, that is to close the client SocketChannel after the sv.awaitFinish call. The method waitToStartTest is on the Server class, maybe refactor rename waitServerStart. The waitToStartTest is a private method on Server but is really part of the public interface to the Server (the fact Server is a static inner class gives access to the private waitServerStart) Line 174 the invocation connAcceptLatch.countDown which sets the test in motion could, for the sake of symmetry, be encapsulated in a method void signalServerStarted() { Another aspect of the test that caught the eye is the fact that the ServerSocketChannel bind is invoked with just the SocketAddress and doesn’t specify any backlog. IIRC correctly this results in a backlog of 0 being used. Since back in the day, It has been best practice not to specify a backlog of 0, especially for portability, as backlog 0 semantics are ill defined (or in most cases not defined at all) on most OS platforms. A slight digression from this PR, and a general comment on the ServerSocketChannel::bind(SocketAddress local) I think it would be better if the ServerSocketChannel implementation used a NON zero default backlog value out of the box, e.g. 5 rather than the backlog of 0. This could, also, be overridden with a System Property java.nio.DefaultSocketBacklog for use when the single arg ServerSocketChannel::bind method call is used. This would then give common uniform semantics across all OS platforms. Rather then the nebulous semantics for backlog value of zero. |
Can I please get a review of this test-only change which proposes to address an intermittent test failure in
java/nio/channels/SocketChannel/VectorIO.java
?As noted in https://bugs.openjdk.org/browse/JDK-8291652, this test has been failing intermittently in our CI. Some years back the test was improved to include additional debug logs to identify the root cause https://bugs.openjdk.org/browse/JDK-8180085. In a recent failure, these test logs indicate that the
Server
thread hadn't yetaccept()
ed a Socket connection, when the client side of the test threw an exception because it had waited for 8 seconds for the server side of the test to complete.The change in this PR updates the test to wait for the
Server
thread to reach a point where it is ready toaccept()
a Socket connection. Only after it reaches this state, the client side of the testing will be initiated. Furthermore, the artificial 8 second wait has been removed from this test and it now waits as long as it takes for the testing to complete. If the test waits far too long then the jtreg infrastructure will timeout the test and at the same time capture the necessary artifacts to help debug unexpected time outs.While at it, the test has also been updated to use
InetAddress.getLoopbackAddress()
instead of localhost. This should prevent any unexpected address mappings for localhost from playing a role in this test.With these changes, I've run the test more than 1000 times in our CI and it hasn't failed.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26049/head:pull/26049
$ git checkout pull/26049
Update a local copy of the PR:
$ git checkout pull/26049
$ git pull https://git.openjdk.org/jdk.git pull/26049/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26049
View PR using the GUI difftool:
$ git pr show -t 26049
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26049.diff
Using Webrev
Link to Webrev Comment