-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Problem
The system is experiencing those errors which apparently lead to resource consumption that makes the web server no longer respond to any requests after a few iterations. Why?
2025-08-04 16:27:49,857 INFO [patzilla.access.uspto.pdf ][MainThread] PDF US1748277A: Accessing USPTO document server: http://pdfpiw.uspto.gov/fdd/77/482/017/0.pdf
2025-08-04 16:29:57,153 WARNING [patzilla.access.generic.pdf ][MainThread] PDF US1748277A: Not available from USPTO. HTTPConnectionPool(host='pdfpiw.uspto.gov', port=80): Max retries exceeded with url: /fdd/77/482/017/0.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bcc4d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
2025-08-04 16:29:57,154 ERROR [patzilla.access.generic.pdf ][MainThread] Traceback (most recent call last):
File "/path/to/site-packages/patzilla/access/generic/pdf.py", line 80, in pdf_universal_real
response.pdf = uspto_fetch_pdf(patent)
File "/path/to/site-packages/beaker/cache.py", line 599, in cached
return cache[0].get_value(cache_key, createfunc=go)
File "/path/to/site-packages/beaker/cache.py", line 322, in get
return self._get_value(key, **kw).get_value()
File "/path/to/site-packages/beaker/container.py", line 378, in get_value
v = self.createfunc()
File "/path/to/site-packages/beaker/cache.py", line 597, in go
return func(*args, **kwargs)
File "/path/to/site-packages/patzilla/access/uspto/pdf.py", line 108, in fetch_pdf
response = requests.get(url, headers={'User-Agent': regular_user_agent})
File "/path/to/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/path/to/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/path/to/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/path/to/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/path/to/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='pdfpiw.uspto.gov', port=80):
Max retries exceeded with url: /fdd/77/482/017/0.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bcc4d0>:
Failed to establish a new connection: [Errno 110] Connection timed out',))
Thoughts
We probably need to turn off this subsystem so it no longer causes any troubles? Or improve error handling at this spot? Or try to switch to another data source URL, if there is any?
Metadata
Metadata
Assignees
Labels
No labels