Skip to content

Minimalist web crawler for building website graphs and extracting link information. It features a Svelte frontend, a Node.js backend, and supports parallel web scraping

License

Notifications You must be signed in to change notification settings

MiraZzle/site-seeker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alt text

Version Badge Contributors Badge License Badge

Svelte TypeScript SQLite Docker Express.js

About

SiteSeeker is a minimalist web crawler designed for constructing detailed website graphs and extracting link information. The app supports parallel web scraping to ensure efficient and scalable data extraction.


Prerequisites

Warning

To run the application, you will need Docker installed.

You can specify custom environment variables in the .env file located at the root of the project. By default, the following ports are used for backend:

BACKEND_PORT=3000
FRONTEND_PORT=8080

The .env file in Frontend has to contain localhost with the port specified as FRONTEND_PORT in the global .env file.

VITE_API_URL=http://localhost:3000

You can modify these values to suit your needs.

Usage

To clone and run the project using Docker, follow these steps:

git clone https://github.com/MiraZzle/site-seeker.git
docker compose up --build

After building the containers, you can use the following command to run them:

docker compose up

This will start both the backend and frontend of the application.

Development

If you need to to run the project in development mode. Follow the steps below:

git clone https://github.com/MiraZzle/site-seeker.git

Backend

  1. Navigate to the backend folder:
cd backend
  1. Install dependencies:
npm install
  1. Start the backend server:
node src/server.mjs

Frontend

  1. Navigate to the frontend folder:
cd frontend
  1. Install dependencies:
npm install
  1. Start the development server:
npm run dev

Once the application is running, visit http://localhost followed by the port number you specified in the .env file (e.g., http://localhost:3000 for the backend or http://localhost:5173 for the frontend) to access the respective services.

Documentation and Wiki

For detailed information about the project's architecture, features, and usage examples, please visit our GitHub Wiki.

About

Minimalist web crawler for building website graphs and extracting link information. It features a Svelte frontend, a Node.js backend, and supports parallel web scraping

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •