Skip to content

Add HTTP resilience mode #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

jessemortenson
Copy link
Contributor

This is a WIP draft of porting the HTTP resilience features that @elseagle introduced (originally in the FL scraper, later I re-used in IA scraper) up into openstates-core. The goal is to turn these features into a toggleable behavior instead of something that needs to be grafted onto scrapers one-by-one.

This is not sufficiently tested yet. I'm running an IA scrape (with the grafted version removed from scraper code), and will want to do the same with FL. And then test with some other scrapers to make sure I haven't regressed something.

cc @showerst and @alexobaseki

Copy link
Contributor

@alexobaseki alexobaseki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

self.session = requests.Session()

# IA specific thing
self.session.headers.update({"X-Requested-With": "XMLHttpRequest"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it will have an unintended effect on requests in other jurisdictions. Maybe we should consider pulling it out of the base class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded, pretty sure that'll trigger different behavior on a few sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah that's a mistake in porting the code - thanks for catching!

@@ -66,6 +71,22 @@ def clean_whitespace(obj):
return obj


def get_random_user_agent():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good approach for now, but in the future it might be annoying to have to update core if want to change this pool. An env var seems like it would be a monster mess. I kinda hate to go adding config files but maybe there's some better way to define in this in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this starting point is not optimal. Worth going as far as pulling a library in (https://pypi.org/project/fake-useragent/) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants