Skip to content

DIAL_CLS / DIAL_RSP race leading to connection leak #404

@tallclair

Description

@tallclair

There could be a race condition where a DIAL_CLS packet from the frontend is received at the same time as a DIAL_RSP from the backend that could lead to the backend connection being leaked:

This could happen if the following conditions happen in this order:

  1. DIAL_RSP received from the backend
  2. The pending dial is still present in
    if frontend, ok := s.PendingDial.Get(resp.Random); !ok {
  3. Frontend starts shutting down, sends a DIAL_CLS (prior to [konnectivity-client] Ensure grpc tunnel is closed on dial failure #398 it wouldn't even send a close request)
  4. Server sends the dial response the frontend - The FE gRPC stream is still open so the packet is received, but the frontend doesn't process it:
    err := frontend.send(pkt)
  5. At this point, the server thinks the connection is established, but the frontend is not aware of that, and in the process of shutting down, leading to a leaked backend connection.

This seems fairly unlikely (at least once #403 is fixed), but worth tracking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions