Skip to content

Commit cf67be7

Browse files
ddstreetgregkh
authored andcommitted
net: tcp: close sock if net namespace is exiting
[ Upstream commit 4ee806d ] When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman <ddstreet@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent a44d911 commit cf67be7

File tree

3 files changed

+28
-0
lines changed

3 files changed

+28
-0
lines changed

include/net/net_namespace.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,11 @@ int net_eq(const struct net *net1, const struct net *net2)
213213
return net1 == net2;
214214
}
215215

216+
static inline int check_net(const struct net *net)
217+
{
218+
return atomic_read(&net->count) != 0;
219+
}
220+
216221
void net_drop_ns(void *);
217222

218223
#else
@@ -237,6 +242,11 @@ int net_eq(const struct net *net1, const struct net *net2)
237242
return 1;
238243
}
239244

245+
static inline int check_net(const struct net *net)
246+
{
247+
return 1;
248+
}
249+
240250
#define net_drop_ns NULL
241251
#endif
242252

net/ipv4/tcp.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2215,6 +2215,9 @@ void tcp_close(struct sock *sk, long timeout)
22152215
tcp_send_active_reset(sk, GFP_ATOMIC);
22162216
__NET_INC_STATS(sock_net(sk),
22172217
LINUX_MIB_TCPABORTONMEMORY);
2218+
} else if (!check_net(sock_net(sk))) {
2219+
/* Not possible to send reset; just close */
2220+
tcp_set_state(sk, TCP_CLOSE);
22182221
}
22192222
}
22202223

net/ipv4/tcp_timer.c

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *sk)
5050
* to prevent DoS attacks. It is called when a retransmission timeout
5151
* or zero probe timeout occurs on orphaned socket.
5252
*
53+
* Also close if our net namespace is exiting; in that case there is no
54+
* hope of ever communicating again since all netns interfaces are already
55+
* down (or about to be down), and we need to release our dst references,
56+
* which have been moved to the netns loopback interface, so the namespace
57+
* can finish exiting. This condition is only possible if we are a kernel
58+
* socket, as those do not hold references to the namespace.
59+
*
5360
* Criteria is still not confirmed experimentally and may change.
5461
* We kill the socket, if:
5562
* 1. If number of orphaned sockets exceeds an administratively configured
5663
* limit.
5764
* 2. If we have strong memory pressure.
65+
* 3. If our net namespace is exiting.
5866
*/
5967
static int tcp_out_of_resources(struct sock *sk, bool do_reset)
6068
{
@@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct sock *sk, bool do_reset)
8391
__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY);
8492
return 1;
8593
}
94+
95+
if (!check_net(sock_net(sk))) {
96+
/* Not possible to send reset; just close */
97+
tcp_done(sk);
98+
return 1;
99+
}
100+
86101
return 0;
87102
}
88103

0 commit comments

Comments
 (0)