Skip to content

stub_status counter leak when killing old workers #99

@mookie-

Description

@mookie-

Environment

Include the result of the following commands:

  • nginx -V
nginx version: nginx/1.27.1
built by gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
built with OpenSSL 3.0.2 15 Mar 2022
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-http_v3_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -ffile-prefix-map=/data/builder/debuild/nginx-1.27.1/debian/debuild-base/nginx-1.27.1=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
  • uname -a
Linux lb 5.15.0-100-generic #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux 

Description

Describe the bug in full detail including expected and actual behavior.
Specify conditions that caused it. Provide the relevant part of nginx
configuration and debug log.

  • The bug is reproducible with the latest version of nginx
  • The nginx configuration is minimized to the smallest possible
    to reproduce the issue and doesn't contain third-party modules

Hello,

we've started killing (sending SIGTERM) "old" nginx workers (nginx: worker process is shutting down) as we have regular configuration changes and a lot of websocket connections.

Since we do this, the counters from stub_status are incorrect.

nginx status

> curl localhost/nginx_status
Active connections: 65369
server accepts handled requests
 1042173178 1042173178 5035465167
Reading: 0 Writing: 31968 Waiting: 5356

Adding up Writing and Waiting it's just 37324 instead of the 65369 "active connections". But even 37324 is too high. The correct number should be around this:

> ss | grep https | wc -l
7327

It's reproducable e.g. using echo.websocket.org:

nginx configuration

map $http_upgrade $connection_upgrade {
	default upgrade;
	''      close;
}

server {
	listen 1234;
	location / {
		proxy_set_header  Host echo.websocket.org;
		proxy_ssl_server_name on;
		proxy_ssl_name echo.websocket.org;
		proxy_http_version 1.1;
		proxy_set_header Upgrade $http_upgrade;
		proxy_set_header Connection $connection_upgrade;

		proxy_pass https://echo.websocket.org:443;
	}
	location = /basic_status {
		    stub_status;
	}
}

Test

Open 2 tabs in the browser with the URL: http://localhost:1234/.ws
Now there are 3 active connection (one connection is the request to /basic_status):

# curl localhost:1234/basic_status
Active connections: 3
server accepts handled requests
 4 4 4
Reading: 0 Writing: 3 Waiting: 0

After that, reload the nginx process and you can see a nginx: worker process is shutting down process:

# systemctl reload nginx

# ps aux | grep [n]ginx
root     2233850  0.0  0.0  55372  5652 ?        Ss   22:29   0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 2233851  0.0  0.0  55896  6044 ?        S    22:29   0:00 nginx: worker process is shutting down
nginx 2233940  0.0  0.0  55868  5400 ?        S    22:30   0:00 nginx: worker process
nginx 2233941  0.0  0.0  55868  5240 ?        S    22:30   0:00 nginx: worker process

# kill 2233851

After we've killed the old process the websocket client in the browser will reconnect. After that we get two additional connection from stub_status even as the old connections are gone:

# curl localhost:1234/basic_status
Active connections: 5
server accepts handled requests
 7 7 7
Reading: 0 Writing: 5 Waiting: 0

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions