Welcome to WebWhisper – an AI-powered web scraping and content parsing solution with a sleek, modern interface. This project uses Streamlit for a responsive UI, Selenium for robust web scraping, and advanced AI models for intelligent content parsing.
- 🚀 Introduction
- ✨ Features
- ⚙️ Installation
- 💻 Usage
- 🛠️ Troubleshooting
- 🌐 Setting Up ChromeDriver
- 📄 License
WebWhisper provides an intuitive way to scrape and process web content using AI. Developed with a focus on user experience, it leverages OpenRouter’s DeepSeek API to transform raw web data into structured insights.
✅ Web Scraping: Efficiently scrape websites using Selenium.
✅ HTML Parsing: Clean and structure content with BeautifulSoup.
✅ AI-Powered Parsing: Extract key information using DeepSeek’s NLP capabilities.
✅ Modern UI: Interactive and responsive design built with Streamlit.
✅ Custom Parsing Instructions: Tailor outputs to your data needs.
💡 Get started in minutes with these steps:
-
Clone the Repository:
git clone https://github.com/imrobintomar/webwhisper cd webwhisper
-
Set Up Virtual Environment:
python -m venv venv # Activate: # Windows: .\venv\Scripts\activate # macOS/Linux: source venv/bin/activate
-
Install Dependencies:
pip install -r requirements.txt
-
Add Environment Variables: Create a
.env
file with your API key:OPENROUTER_API_KEY=your_openrouter_api_key
-
Launch the App:
streamlit run main.py
-
Input & Scrape:
- Enter the target website URL.
- Click Scrape Website to extract content.
-
View & Parse:
- Explore the DOM content in the View DOM Content section.
- Enter instructions and click Parse with DeepSeek to extract insights.
🔑 API Key Errors: Verify .env
setup.
🌐 Network Issues: Check connection & DeepSeek service status.
📏 Content Limits: Ensure content meets DeepSeek’s size constraints.
📄 Logs: Review Streamlit logs for detailed error messages.
- Download: Visit ChromeDriver Downloads and select the version matching your Chrome browser.
- Add to PATH:
- Windows:
setx PATH "%PATH%;C:\path\to\chromedriver"
- macOS/Linux:
export PATH=$PATH:/path/to/chromedriver
- Windows:
This project is licensed under the MIT License. See the LICENSE file for details.
🚀 Start transforming web data into valuable insights with WebWhisper!