Skip to content

watercrawl/watercrawl-dify-plugin

Repository files navigation

WaterCrawl Plugin

Author: watercrawl
Version: 0.5.0 Type: Tool

Description

WaterCrawl is a powerful web crawling tool designed for developers. This plugin allows you to easily crawl websites, extract data, and search the web. It is compatible with WaterCrawl v0.9.* and offers a range of features for efficient data extraction and discovery.

Features

Scrape Tool:

Scrapes a single URL and outputs the content in markdown format with options for including HTML, links, and screenshots.

Crawl Tool:

Initiates a web crawl to extract data from specified URLs with configurable options for including/excluding URL patterns and generating alt text for images.

Crawl Job Tool:

Manage crawl jobs using a crawl request UUID. You can retrieve the job status, get all results, fetch a single result by its UUID, or cancel an ongoing task.

Search Tool:

Search the web for information using WaterCrawl's search API with configurable options for language, country, time range, and search depth.

Search Job Tool:

Retrieve search results or manage search jobs based on a search request UUID. You can get the status and results of a search, or cancel it.

Sitemap Tool:

Generates a sitemap directly from a URL. You can configure it to include subdomains, ignore existing sitemap.xml files, and more.

Installation

To install the WaterCrawl plugin, follow these steps: Download the latest .difypkg file from the GitHub Releases or from the Dify marketplace.

Authentication

Login to your WaterCrawl account here. Or your self-hosted WaterCrawl instance. In the dashboard go to API Keys and create a new key or use an existing key.

then in the Dify plugin management page, go to Plugins and click on the + Install button. and use API Key in the plugin configuration.

Contributing / Development

To contribute to the WaterCrawl plugin, follow these steps:

  1. Clone the repository: git clone https://github.com/watercrawl/watercrawl-dify-plugin.git
  2. Navigate to the project directory: cd watercrawl-dify-plugin
  3. Make virtual environment: python -m venv env
  4. Activate the virtual environment: source env/bin/activate
  5. Install dependencies: pip install -r requirements.txt
  6. Copy the .env.example file to .env and fill in the necessary values.
  7. Run the plugin: python -m main

Support

For support, please contact us at support@watercrawl.dev.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages