Skip to content

Releases: jaypyles/Scraperr

v1.0.8 (Optional Registration)

11 May 16:12
8703f70
Compare
Choose a tag to compare

Summary

Introduces functionality to conditionally disable user registration based on the REGISTRATION_ENABLED environment variable. When registration is disabled, a default user is automatically created if credentials are provided.

🔐 Registration Check API

  • Added /auth/check endpoint to expose REGISTRATION_ENABLED value to the frontend.

🛠️ Startup Logic

  • On startup, the backend checks for required default user credentials:
    • DEFAULT_USER_EMAIL
    • DEFAULT_USER_PASSWORD
    • DEFAULT_USER_FULL_NAME
  • If registration is disabled and any of these are missing, the app will log an error and exit.

✅ Frontend Integration

  • Merged AI and auth check responses into a single /api/check endpoint.
  • Updated login form UI to reflect the registration status:
    • Hides the "Sign up" button when registration is disabled.
    • Displays a message indicating that registration is not available.

🧹 Minor Fixes

  • Improved response format for AI availability check:
    GET /api/ai/check now returns { ai_enabled: true/false }.
  • Moved and renamed API route for better organization and frontend access.

v1.0.7

10 May 23:40
b40d378
Compare
Choose a tag to compare

🐛 Fix "Invalid Date" Issue in AI Job Menu Dropdown

Summary

This PR fixes Issue #59, where the AI job selection dropdown was showing blank entries with the tooltip "Invalid Date" despite jobs completing successfully.

✅ Fix Details

  • Corrected the logic that supplies job data to the AI chat interface.
  • Ensured the correct job reference is passed when rendering the dropdown options.
  • Verified that job metadata, including dates and names, now display as expected.

v1.0.6 (Media Collection)

10 May 20:15
8cd3059
Compare
Choose a tag to compare

✨ Add Media Collection to Scraping Pipeline

Summary

This PR introduces the collect_media function, which enhances scraping capabilities by automatically detecting and downloading various types of media assets from a web page using a Selenium-controlled browser session.

🔧 Features

Supported Media Types:

  • Images (<img>)
  • Videos (<video>)
  • Audio files (<audio>)
  • PDFs (<a href="*.pdf">)
  • Documents (.doc, .docx, .txt, .rtf)
  • Presentations (.ppt, .pptx)
  • Spreadsheets (.xls, .xlsx, .csv)

Functionality:

  • Uses CSS selectors to find elements containing media links.
  • Downloads each valid media file (HTTP/HTTPS only).
  • Saves all assets to a structured media/ directory, grouped by media type.
  • Writes a download_summary.txt with the original URLs and their local file paths.

Error Handling:

  • Skips failed downloads and logs the error.
  • Generates fallback filenames when none are detected in the URL.

v1.0.5 (Cron Jobs)

25 Apr 03:14
3475d66
Compare
Choose a tag to compare

Adds cron jobs to allow the user to submit scraping jobs on a desired cron interval.

image

v1.0.4

22 Nov 00:14
Compare
Choose a tag to compare

Improvements

  • Drop MongoDB, replace with sqllite

Reasoning

Remove the need for another dependency, sqllite is more than enough to act as a queue.

v1.0.3

17 Nov 03:03
7d80ff5
Compare
Choose a tag to compare

Improvements

  • Added the ability to create a "Site Map" which is a series of actions performed on a site. This currently is either clicking a button or inputting text into a field.

Bug Fixes

  • Issue with scroll ability on job queue
  • Issue with scroll ability on main page

v1.0.2

13 Nov 04:40
Compare
Choose a tag to compare

Improvements

  • Restructure to pass API calls to Next API first
  • Remove dependency on reverse proxy

v1.0.1

10 Nov 03:29
1cdffd9
Compare
Choose a tag to compare

Summary

  • Added the ability to pass in a list of comma separated proxies to be used when submitting a scraping job.

v1.0.0

07 Nov 01:25
Compare
Choose a tag to compare

Improvements

  • Add scrolling when attempting to scrape a page a long page which may trigger data loading on scroll
  • Remove login requirement to scrape
  • Add "Go to Job" after submitting a job
  • Fix issues with commas breaking CSV files on download
  • Fix broken logs page