Skip to content

AI-Scraper-crawl4AI is a web scraping application built with Streamlit and Crawl4AI. It allows users to input a URL, scrape the content of the webpage, and download the scraped content as a Markdown file.

Notifications You must be signed in to change notification settings

manjushree08/AI-Scraper-crawl4AI-Streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Scraper-crawl4AI

AI-Scraper-crawl4AI is a web scraping application built with Streamlit and Crawl4AI. It allows users to input a URL, scrape the content of the webpage, and download the scraped content as a Markdown file.

Application Screenshot

Features

  • Custom styling for a visually appealing interface
  • URL input for specifying the webpage to scrape
  • Button to trigger the web scraping process
  • Error handling for request errors and general exceptions
  • Download link for the scraped Markdown content

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/AI-Scraper-crawl4AI.git
    cd AI-Scraper-crawl4AI
  2. Create a virtual environment and activate it:

    python -m venv myenv
    source myenv/bin/activate  # On Windows, use `myenv\Scripts\activate`
  3. Install the required dependencies:

    pip install -r requirements.txt

Usage

  1. Run the Streamlit app:

    streamlit run main.py
  2. Open your web browser and go to http://localhost:8501.

  3. Enter the URL of the webpage you want to scrape in the input field.

  4. Click the "Run Web Scraper" button to start the scraping process.

  5. Once the content is scraped, you can download it as a Markdown file by clicking the download button.

Project Structure

  • main.py: The main script that runs the Streamlit app and handles the web scraping logic.
  • requirements.txt: The list of dependencies required for the project.
  • myenv: The virtual environment directory (not included in the repository).

Dependencies

  • Streamlit
  • Requests
  • Crawl4AI
  • asyncio

License

This project is for personal use only

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Acknowledgements

About

AI-Scraper-crawl4AI is a web scraping application built with Streamlit and Crawl4AI. It allows users to input a URL, scrape the content of the webpage, and download the scraped content as a Markdown file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages