-
Notifications
You must be signed in to change notification settings - Fork 29
Add HTTP resilience mode #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
self.session = requests.Session() | ||
|
||
# IA specific thing | ||
self.session.headers.update({"X-Requested-With": "XMLHttpRequest"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it will have an unintended effect on requests in other jurisdictions. Maybe we should consider pulling it out of the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconded, pretty sure that'll trigger different behavior on a few sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yeah that's a mistake in porting the code - thanks for catching!
@@ -66,6 +71,22 @@ def clean_whitespace(obj): | |||
return obj | |||
|
|||
|
|||
def get_random_user_agent(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good approach for now, but in the future it might be annoying to have to update core if want to change this pool. An env var seems like it would be a monster mess. I kinda hate to go adding config files but maybe there's some better way to define in this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this starting point is not optimal. Worth going as far as pulling a library in (https://pypi.org/project/fake-useragent/) ?
This is a WIP draft of porting the HTTP resilience features that @elseagle introduced (originally in the FL scraper, later I re-used in IA scraper) up into openstates-core. The goal is to turn these features into a toggleable behavior instead of something that needs to be grafted onto scrapers one-by-one.
This is not sufficiently tested yet. I'm running an IA scrape (with the grafted version removed from scraper code), and will want to do the same with FL. And then test with some other scrapers to make sure I haven't regressed something.
cc @showerst and @alexobaseki