Building a full-stack application that scrapes the latest stories from Hacker News every 5 minutes using a cron job, stores them in a MySQL database, and delivers real-time updates to clients via WebSockets.
- Scrapes Hacker News stories using Axios and Cheerio.
- Stores data in a MySQL database.
- Provides a REST API for fetching stories.
- Sends real-time updates to connected clients using WebSockets.
- Sends the number of stories published in the last 5 minutes upon WebSocket connection.
- Install Node.js.
- Install MySQL.
- Create a
.env
file in the root directory and configure the following variables:DB_HOST=your_database_host DB_USER=your_database_user DB_PASSWORD=your_database_password DB_NAME=your_database_name PORT=3000 WS_PORT=3001
-
Clone the repository:
git clone <repository_url> cd scraper
-
Install dependencies:
npm install
-
Set up the database:
CREATE DATABASE mydb; USE mydb; CREATE TABLE stories ( id INT PRIMARY KEY, title VARCHAR(255), url VARCHAR(255), timestamp DATETIME );
-
Start the backend:
npm run dev
The API will run on http://localhost:3000
, and the WebSocket server will run on ws://localhost:3001
.
-
Navigate to your frontend directory or create a new React Vite project:
cd frontend
-
Install dependencies:
npm install
-
Start the React development server:
npm run dev
Your frontend will run on http://localhost:5173
by default.
- Start the backend server.
- Start the React frontend server.
- Open the frontend in your browser and see the latest stories and updates.
- The backend automatically scrapes new stories every 5 minute using
node-cron
.
GET /api/stories
: Fetch all stories from the database.
- Backend: Node.js, Express.js, MySQL, WebSocket
- Frontend: React (Vite), WebSocket API
- Scraping: Axios, Cheerio
- Scheduling: node-cron