-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
Context
Our file downloaders could use a bit of a rework. They seem overly complex and only able to support a few different file types; with various modules calling to each other and requiring a specific order that is unclear. Not to mention all the defunct scripts littered about. I believe a much more straightforward approach is possible and will go a long way in helping people understand how and when to use our util
modules. During work on #227, I found this way that will download any file type when provided with a download url
:
r = requests.get(url, stream=True)
with open(file_path, 'wb') as fd:
for chunk in r.iter_content():
fd.write(chunk)
SEE: downloaders.py, get_files.py, muckrock_scraper.py
Requirements
- Should be simple and easy for people to understand how to consume the module(s) and how they work
- Should be clear what modules, in what order, and when they should be called
- Should not break any existing functionality of scrapers or other
util
scripts
Docs
- Docs related to the file downloaders and
util
scripts should be updated where necessary - New docs should be written to explain how to use and consume the file downloaders
Open questions
- This will likely be time consuming to understand what's going on with the code, what functionality should be kept, and how to untangle it
- Perhaps think about keeping an entire pipeline of functionality in one folder for organizational purposes
Metadata
Metadata
Assignees
Labels
No labels