Skip to content

Shristirajpoot/WebClonePro

Repository files navigation

🌐 WebClonePro — Dynamic Webpage Scraper & Recreator

GitHub Repo stars GitHub last commit Built with Python FastAPI Playwright License


🚀 Overview

WebClonePro is an advanced tool that allows users to input any public website URL and generate a clone-like HTML preview using headless browser scraping.

This project showcases dynamic content extraction using Playwright, fast API development with FastAPI, and a sleek user interface powered by Next.js & TypeScript.


🎥 Demo Video

📺 Watch the walkthrough here:
Watch the demo

🔗 Click the image or [watch on YouTube]((https://youtu.be/jfMwgjjgFoE)


🛠️ Features

  • 🔗 Clone any public website URL
  • 🎭 Powerful dynamic scraping using Playwright
  • ⚡ Fast & asynchronous backend with FastAPI
  • 📄 Metadata, styles, and HTML extraction
  • 🖥️ Frontend preview of cloned content
  • 🧠 Intelligent error handling and feedback
  • 📦 Optional support for downloadable static sites (future)
  • ⚙️ Planned caching and performance boosts

📂 Project Structure

WebClonePro/
├── backend/               # FastAPI backend for scraping logic
│   ├── hello.py           # Main API app
│   ├── requirements.txt   # Python dependencies
│   └── ...                # Utilities, models
│
├── frontend/              # Next.js frontend for UI
│   ├── pages/             # Main input and result pages
│   ├── components/        # Reusable React components
│   └── ...                # Static assets, config
│
├── README.md              # Project documentation
└── LICENSE                # MIT License

🖼️ Screenshots 📸 Real views of WebClonePro in action:

URL Input Page Preview of Cloned Site
Screenshot1 Screenshot2
Metadata Extracted Navigation Sample
------------------------------------------------------ ------------------------------------------------------
Screenshot3 Screenshot4
Example Cloned Responsive View
------------------------------------------------------ ------------------------------------------------------
Screenshot5 Screenshot6

📋 Getting Started

  • 🔧 Prerequisites

  • 🐍 Python 3.9+

  • 🔧 Node.js 16+

  • 🎭 Install Playwright Browsers:

python -m playwright install

📥 Installation & Setup

1️⃣ Clone Repository

git clone https://github.com/Shristirajpoot/WebClonePro.git
cd WebClonePro

2️⃣ Backend Setup

cd backend
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate

# Activate (macOS/Linux)
source venv/bin/activate

pip install --upgrade pip
pip install -r requirements.txt
python -m playwright install

uvicorn hello:app --reload

Server will run at http://localhost:8000

3️⃣ Frontend Setup

cd ../frontend
npm install
npm run dev

Frontend available at http://localhost:3000

🔍 Usage

  • Open your browser and go to http://localhost:3000

  • Enter any public website URL

  • Submit the form and wait for the scraping

  • View the cloned preview and metadata

🚧 Challenges & Solutions

  • ⚙️ Managed Playwright async subprocesses on Windows

  • 🕸️ Solved dynamic loading via network idle wait strategy

  • 🛡️ Implemented robust error handling and fallback messages

  • 🚀 Used async def for high-performance scraping

🧭 Future Enhancements

  • 📄 Clone multi-page sites with link traversal

  • 📦 Enable download of static HTML/CSS packages

  • 🎨 Add history and clone logs on UI

  • ⚡ Caching for repeated URLs

📞 Contact

Shristi Rajpoot

📄License

MIT License. See LICENSE for details..

🌟 If you found this useful, please ⭐ star the repo and share it!

About

Website Cloning App – Enter any public URL to preview an AI-generated HTML clone.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published