The scrapy-proxy-headers
package is designed for adding proxy headers to HTTPS requests in Scrapy.
In normal usage, custom headers put in request.headers
cannot be read by a proxy when you make a HTTPS request, because the headers are encrypted and passed through the proxy tunnel, along with the rest of the request body. You can read more about this at Proxy Server Requests over HTTPS.
Because Scrapy does not have a good way to pass custom headers to a proxy when you make HTTPS requests, we at ProxyMesh made this extension to support our customers that use Scrapy and want to use custom headers to control our proxy behavior. But this extension can work for handling custom headers through any proxy.
To use this extension, do the following:
pip install scrapy-proxy-headers
- In your Scrapy
settings.py
, add the following:
DOWNLOAD_HANDLERS = {
"https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
}
When you want make a request with a custom proxy header, instead of using request.headers
, use request.meta["proxy_headers"]
like this:
request.meta["proxy_headers"] = {"X-ProxyMesh-Country": "US"}
Any response headers that might come from the proxy will be saved in response.headers
, as in response.headers["X-ProxyMesh-IP"]
.