Releases: jaypyles/Scraperr
Releases · jaypyles/Scraperr
v1.0.8 (Optional Registration)
Summary
Introduces functionality to conditionally disable user registration based on the REGISTRATION_ENABLED
environment variable. When registration is disabled, a default user is automatically created if credentials are provided.
🔐 Registration Check API
- Added
/auth/check
endpoint to exposeREGISTRATION_ENABLED
value to the frontend.
🛠️ Startup Logic
- On startup, the backend checks for required default user credentials:
DEFAULT_USER_EMAIL
DEFAULT_USER_PASSWORD
DEFAULT_USER_FULL_NAME
- If registration is disabled and any of these are missing, the app will log an error and exit.
✅ Frontend Integration
- Merged AI and auth check responses into a single
/api/check
endpoint. - Updated login form UI to reflect the registration status:
- Hides the "Sign up" button when registration is disabled.
- Displays a message indicating that registration is not available.
🧹 Minor Fixes
- Improved response format for AI availability check:
GET /api/ai/check
now returns{ ai_enabled: true/false }
. - Moved and renamed API route for better organization and frontend access.
v1.0.7
🐛 Fix "Invalid Date" Issue in AI Job Menu Dropdown
Summary
This PR fixes Issue #59, where the AI job selection dropdown was showing blank entries with the tooltip "Invalid Date" despite jobs completing successfully.
✅ Fix Details
- Corrected the logic that supplies job data to the AI chat interface.
- Ensured the correct job reference is passed when rendering the dropdown options.
- Verified that job metadata, including dates and names, now display as expected.
v1.0.6 (Media Collection)
✨ Add Media Collection to Scraping Pipeline
Summary
This PR introduces the collect_media
function, which enhances scraping capabilities by automatically detecting and downloading various types of media assets from a web page using a Selenium-controlled browser session.
🔧 Features
Supported Media Types:
- Images (
<img>
) - Videos (
<video>
) - Audio files (
<audio>
) - PDFs (
<a href="*.pdf">
) - Documents (
.doc
,.docx
,.txt
,.rtf
) - Presentations (
.ppt
,.pptx
) - Spreadsheets (
.xls
,.xlsx
,.csv
)
Functionality:
- Uses CSS selectors to find elements containing media links.
- Downloads each valid media file (HTTP/HTTPS only).
- Saves all assets to a structured
media/
directory, grouped by media type. - Writes a
download_summary.txt
with the original URLs and their local file paths.
Error Handling:
- Skips failed downloads and logs the error.
- Generates fallback filenames when none are detected in the URL.
v1.0.5 (Cron Jobs)
v1.0.4
v1.0.3
Improvements
- Added the ability to create a "Site Map" which is a series of actions performed on a site. This currently is either clicking a button or inputting text into a field.
Bug Fixes
- Issue with scroll ability on job queue
- Issue with scroll ability on main page
v1.0.2
v1.0.1
v1.0.0
Improvements
- Add scrolling when attempting to scrape a page a long page which may trigger data loading on scroll
- Remove login requirement to scrape
- Add "Go to Job" after submitting a job
- Fix issues with commas breaking CSV files on download
- Fix broken logs page