Skip to content

Connectivity to pdfpiw.uspto.gov broken? #74

@amotl

Description

@amotl

Problem

The system is experiencing those errors which apparently lead to resource consumption that makes the web server no longer respond to any requests after a few iterations. Why?

2025-08-04 16:27:49,857 INFO     [patzilla.access.uspto.pdf               ][MainThread] PDF US1748277A: Accessing USPTO document server: http://pdfpiw.uspto.gov/fdd/77/482/017/0.pdf
2025-08-04 16:29:57,153 WARNING  [patzilla.access.generic.pdf             ][MainThread] PDF US1748277A: Not available from USPTO. HTTPConnectionPool(host='pdfpiw.uspto.gov', port=80): Max retries exceeded with url: /fdd/77/482/017/0.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bcc4d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
2025-08-04 16:29:57,154 ERROR    [patzilla.access.generic.pdf             ][MainThread] Traceback (most recent call last):
  File "/path/to/site-packages/patzilla/access/generic/pdf.py", line 80, in pdf_universal_real
    response.pdf = uspto_fetch_pdf(patent)
  File "/path/to/site-packages/beaker/cache.py", line 599, in cached
    return cache[0].get_value(cache_key, createfunc=go)
  File "/path/to/site-packages/beaker/cache.py", line 322, in get
    return self._get_value(key, **kw).get_value()
  File "/path/to/site-packages/beaker/container.py", line 378, in get_value
    v = self.createfunc()
  File "/path/to/site-packages/beaker/cache.py", line 597, in go
    return func(*args, **kwargs)
  File "/path/to/site-packages/patzilla/access/uspto/pdf.py", line 108, in fetch_pdf
    response = requests.get(url, headers={'User-Agent': regular_user_agent})
  File "/path/to/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/path/to/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/path/to/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/path/to/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/path/to/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='pdfpiw.uspto.gov', port=80): 
Max retries exceeded with url: /fdd/77/482/017/0.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bcc4d0>: 
Failed to establish a new connection: [Errno 110] Connection timed out',))

Thoughts

We probably need to turn off this subsystem so it no longer causes any troubles? Or improve error handling at this spot? Or try to switch to another data source URL, if there is any?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions