Skip to content

Commit 2d414e3

Browse files
authored
Fix graceful server shutdown (#44)
At some point, our server's graceful shutdown logic broke. This fixes a couple issues: - If a diff is in progress while the server is shutting down, it's possible for the diff retry mechanism to fight with the server about shutting down or starting up the process pool for diffing. - The SIGINT signal for ctrl+c is getting passed down to the worker processes in the diff pool, which did not used to happen. We now add explicit initialization code to the workers so they ignore the signal, since the main server process will handle it and tell the workers what to do.
1 parent 76d254b commit 2d414e3

File tree

2 files changed

+15
-1
lines changed

2 files changed

+15
-1
lines changed

docs/source/release-history.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ In Development
77

88
- Fixes an issue where the diffing server could reset the process pool that manages the actual diffs multiple times unnecessarily, leading to wasted memory and CPU. If you are tracking logs and errors, this will also make error messages about the diffing server clearer — you’ll see “BrokenProcessPool” instead of “'NoneType' object does not support item assignment.” (`#38 <https://github.com/edgi-govdata-archiving/web-monitoring-diff/issues/38>`_)
99

10+
- Ensures the server shuts down gracefully when pressing ctrl+c or sending a SIGINT signal. (`#44 <https://github.com/edgi-govdata-archiving/web-monitoring-diff/issues/44>`_)
11+
1012
- Fixes :func:`web_monitoring_diff.html_diff_render` to make sure the spacing of text and tags in the HTML source code of the diff matches the original. This resolves display issues on pages where CSS is used to treat spacing as significant. (`#36 <https://github.com/edgi-govdata-archiving/web-monitoring-diff/issues/36>`_)
1113

1214
- Improve handling of lazy-loaded images in :func:`web_monitoring_diff.html_diff_render`. When images are lazy-loaded via JS, they usually use the ``data-src`` or ``data-srcset`` attributes, and we now check those, too. Additionally, if two images have no detectable URLs, we now treat them as the same, rather than different. (`#37 <https://github.com/edgi-govdata-archiving/web-monitoring-diff/issues/37>`_)

web_monitoring_diff/server/server.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,10 @@ def _get_content_type_headers_from_url(url):
202202
os.environ.get('ACCESS_CONTROL_ALLOW_ORIGIN_HEADER')
203203

204204

205+
def initialize_diff_worker():
206+
signal.signal(signal.SIGINT, signal.SIG_IGN)
207+
208+
205209
class DiffServer(tornado.web.Application):
206210
terminating = False
207211

@@ -486,6 +490,10 @@ async def fetch_diffable_content(self, url, expected_hash, query_params):
486490

487491
return response
488492

493+
# TODO: we should split out all the management of the executor and diffing
494+
# (so this, get_diff_executor, caller, etc.) into a separate object owned
495+
# by the server so we don't need weird bits checking the server's
496+
# `terminating` state and so that all the parts are grouped together.
489497
async def diff(self, func, a, b, params, tries=2):
490498
"""
491499
Actually do a diff between two pieces of content, optionally retrying
@@ -512,6 +520,9 @@ async def diff(self, func, a, b, params, tries=2):
512520
# NOTE: this doesn't do anything async, but if we change it to do so, we
513521
# need to add a lock (either asyncio.Lock or tornado.locks.Lock).
514522
def get_diff_executor(self, reset=False):
523+
if self.application.terminating:
524+
raise RuntimeError('Diff executor is being shut down.')
525+
515526
executor = self.settings.get('diff_executor')
516527
if reset or not executor:
517528
if executor:
@@ -522,7 +533,8 @@ def get_diff_executor(self, reset=False):
522533
except Exception:
523534
pass
524535
executor = concurrent.futures.ProcessPoolExecutor(
525-
DIFFER_PARALLELISM)
536+
DIFFER_PARALLELISM,
537+
initializer=initialize_diff_worker)
526538
self.settings['diff_executor'] = executor
527539

528540
return executor

0 commit comments

Comments
 (0)