AI-Scraper-crawl4AI is a web scraping application built with Streamlit and Crawl4AI. It allows users to input a URL, scrape the content of the webpage, and download the scraped content as a Markdown file.
- Custom styling for a visually appealing interface
- URL input for specifying the webpage to scrape
- Button to trigger the web scraping process
- Error handling for request errors and general exceptions
- Download link for the scraped Markdown content
-
Clone the repository:
git clone https://github.com/yourusername/AI-Scraper-crawl4AI.git cd AI-Scraper-crawl4AI
-
Create a virtual environment and activate it:
python -m venv myenv source myenv/bin/activate # On Windows, use `myenv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run main.py
-
Open your web browser and go to
http://localhost:8501
. -
Enter the URL of the webpage you want to scrape in the input field.
-
Click the "Run Web Scraper" button to start the scraping process.
-
Once the content is scraped, you can download it as a Markdown file by clicking the download button.
- main.py: The main script that runs the Streamlit app and handles the web scraping logic.
- requirements.txt: The list of dependencies required for the project.
- myenv: The virtual environment directory (not included in the repository).
- Streamlit
- Requests
- Crawl4AI
- asyncio
This project is for personal use only
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.