WebClonePro is an advanced tool that allows users to input any public website URL and generate a clone-like HTML preview using headless browser scraping.
This project showcases dynamic content extraction using Playwright, fast API development with FastAPI, and a sleek user interface powered by Next.js & TypeScript.
🔗 Click the image or [watch on YouTube]((https://youtu.be/jfMwgjjgFoE)
- 🔗 Clone any public website URL
- 🎭 Powerful dynamic scraping using Playwright
- ⚡ Fast & asynchronous backend with FastAPI
- 📄 Metadata, styles, and HTML extraction
- 🖥️ Frontend preview of cloned content
- 🧠 Intelligent error handling and feedback
- 📦 Optional support for downloadable static sites (future)
- ⚙️ Planned caching and performance boosts
WebClonePro/
├── backend/ # FastAPI backend for scraping logic
│ ├── hello.py # Main API app
│ ├── requirements.txt # Python dependencies
│ └── ... # Utilities, models
│
├── frontend/ # Next.js frontend for UI
│ ├── pages/ # Main input and result pages
│ ├── components/ # Reusable React components
│ └── ... # Static assets, config
│
├── README.md # Project documentation
└── LICENSE # MIT License
🖼️ Screenshots 📸 Real views of WebClonePro in action:
-
🔧 Prerequisites
-
🐍 Python 3.9+
-
🔧 Node.js 16+
-
🎭 Install Playwright Browsers:
python -m playwright install
1️⃣ Clone Repository
git clone https://github.com/Shristirajpoot/WebClonePro.git
cd WebClonePro
2️⃣ Backend Setup
cd backend
python -m venv venv
# Activate (Windows)
.\venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
python -m playwright install
uvicorn hello:app --reload
Server will run at http://localhost:8000
3️⃣ Frontend Setup
cd ../frontend
npm install
npm run dev
Frontend available at http://localhost:3000
-
Open your browser and go to http://localhost:3000
-
Enter any public website URL
-
Submit the form and wait for the scraping
-
View the cloned preview and metadata
-
⚙️ Managed Playwright async subprocesses on Windows
-
🕸️ Solved dynamic loading via network idle wait strategy
-
🛡️ Implemented robust error handling and fallback messages
-
🚀 Used async def for high-performance scraping
-
📄 Clone multi-page sites with link traversal
-
📦 Enable download of static HTML/CSS packages
-
🎨 Add history and clone logs on UI
-
⚡ Caching for repeated URLs
- 🔗LinkedIn: www.linkedin.com/in/shristi-rajpoot-36774b281
- 📧 Email: shristirajpoot369@gmail.com
- 🔗 GitHub: @Shristirajpoot
MIT License. See LICENSE for details..