From 40bc77daca7fe83415a3c11ae906caca69c5c98c Mon Sep 17 00:00:00 2001 From: seyf97 <111386377+seyf97@users.noreply.github.com> Date: Sun, 2 Jun 2024 16:49:27 +0300 Subject: [PATCH 1/6] Update requirements.txt Remove duplicate requirement "langchain-anthropic" --- requirements.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index 97a1c1bb..254f9f1a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -16,6 +16,5 @@ free-proxy==1.1.1 langchain-groq==0.1.3 playwright==1.43.0 langchain-aws==0.1.2 -langchain-anthropic==0.1.11 yahoo-search-py==0.3 -undetected-playwright==0.3.0 \ No newline at end of file +undetected-playwright==0.3.0 From 08499c2cfb1782d257fbff7b0876f094f083852e Mon Sep 17 00:00:00 2001 From: Marco Vinciguerra Date: Mon, 3 Jun 2024 15:30:15 +0200 Subject: [PATCH 2/6] Update README.md --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index e440133c..807617b3 100644 --- a/README.md +++ b/README.md @@ -164,6 +164,16 @@ print(result) The output will be an audio file with the summary of the projects on the page. +## Sponsors +
+ + SerpAPI + + + Stats + +
+ ## 🤝 Contributing Feel free to contribute and join our Discord server to discuss with us improvements and give us suggestions! @@ -182,16 +192,6 @@ Wanna visualize the roadmap in a more interactive way? Check out the [markmap](h ## ❤️ Contributors [![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors) -## Sponsors -
- - SerpAPI - - - Stats - -
- ## 🎓 Citations If you have used our library for research purposes please quote us with the following reference: ```text From 8a52e138ece4c13760cf99d0b10f834fbd345bee Mon Sep 17 00:00:00 2001 From: Jiangyuan Li <37933431+jiangyuan-li@users.noreply.github.com> Date: Mon, 3 Jun 2024 17:19:47 -0700 Subject: [PATCH 3/6] Update README.md Fix typos in translating "Chinese" --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 807617b3..dbdcc948 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # 🕷️ ScrapeGraphAI: You Only Scrape Once -[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中国人](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) +[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) [![Downloads](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai) [![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint) From c4bf3257283f1795dd47d175d850c55c1327c836 Mon Sep 17 00:00:00 2001 From: SchneeHertz <39257008+SchneeHertz@users.noreply.github.com> Date: Tue, 4 Jun 2024 14:36:17 +0800 Subject: [PATCH 4/6] Improve the Chinese Readme to synchronize with the English Readme. --- README.md | 2 +- docs/chinese.md | 107 ++++++++++++++++++++++++++---------------------- 2 files changed, 60 insertions(+), 49 deletions(-) diff --git a/README.md b/README.md index 807617b3..dbdcc948 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # 🕷️ ScrapeGraphAI: You Only Scrape Once -[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中国人](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) +[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) [![Downloads](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai) [![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint) diff --git a/docs/chinese.md b/docs/chinese.md index f4b64701..5d5b6cd5 100644 --- a/docs/chinese.md +++ b/docs/chinese.md @@ -1,5 +1,5 @@ # 🕷️ ScrapeGraphAI: 只需抓取一次 -[![下载量](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai) +[![下载](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai) [![代码检查: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint) [![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml) [![CodeQL](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml) @@ -21,34 +21,36 @@ Scrapegraph-ai 的参考页面可以在 PyPI 的官方网站上找到: [pypi](ht ```bash pip install scrapegraphai ``` -注意: 建议在虚拟环境中安装该库,以避免与其他库发生冲突 🐱 +**注意**: 建议在虚拟环境中安装该库,以避免与其他库发生冲突 🐱 -🔍 演示 +## 🔍 演示 官方 Streamlit 演示: - +[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app) 在 Google Colab 上直接尝试: +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing) + ## 📖 文档 -ScrapeGraphAI 的文档可以在这里找到。 +ScrapeGraphAI 的文档可以在[这里](https://scrapegraph-ai.readthedocs.io/en/latest/)找到。 -还可以查看 Docusaurus 这里。 +还可以查看 Docusaurus 的[版本](https://scrapegraph-doc.onrender.com/)。 ## 💻 用法 有三种主要的爬取管道可用于从网站(或本地文件)提取信息: -SmartScraperGraph: 单页爬虫,只需用户提示和输入源; -SearchGraph: 多页爬虫,从搜索引擎的前 n 个搜索结果中提取信息; -SpeechGraph: 单页爬虫,从网站提取信息并生成音频文件。 -SmartScraperMultiGraph: 多页爬虫,给定一个提示 -可以通过 API 使用不同的 LLM,如 OpenAI,Groq,Azure 和 Gemini,或者使用 Ollama 的本地模型。 +- `SmartScraperGraph`: 单页爬虫,只需用户提示和输入源; +- `SearchGraph`: 多页爬虫,从搜索引擎的前 n 个搜索结果中提取信息; +- `SpeechGraph`: 单页爬虫,从网站提取信息并生成音频文件。 +- `SmartScraperMultiGraph`: 多页爬虫,给定一个提示 +可以通过 API 使用不同的 LLM,如 **OpenAI**,**Groq**,**Azure** 和 **Gemini**,或者使用 **Ollama** 的本地模型。 -案例 1: 使用本地模型的 SmartScraper -请确保已安装 Ollama 并使用 ollama pull 命令下载模型。 +### 案例 1: 使用本地模型的 SmartScraper +请确保已安装 [Ollama](https://ollama.com/) 并使用 `ollama pull` 命令下载模型。 ``` python from scrapegraphai.graphs import SmartScraperGraph @@ -68,7 +70,7 @@ graph_config = { } smart_scraper_graph = SmartScraperGraph( - prompt="列出所有项目及其描述", + prompt="List me all the projects with their descriptions", # 也接受已下载的 HTML 代码的字符串 source="https://perinim.github.io/projects", config=graph_config @@ -76,15 +78,16 @@ smart_scraper_graph = SmartScraperGraph( result = smart_scraper_graph.run() print(result) -``` +``` 输出将是一个包含项目及其描述的列表,如下所示: -python -Copia codice -{'projects': [{'title': 'Rotary Pendulum RL', 'description': '开源项目,旨在使用 RL 算法控制现实中的旋转摆'}, {'title': 'DQN Implementation from scratch', 'description': '开发了一个深度 Q 网络算法来训练简单和双摆'}, ...]} -案例 2: 使用混合模型的 SearchGraph -我们使用 Groq 作为 LLM,使用 Ollama 作为嵌入模型。 +```python +{'projects': [{'title': 'Rotary Pendulum RL', 'description': 'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'}, {'title': 'DQN Implementation from scratch', 'description': 'Developed a Deep Q-Network algorithm to train a simple and double pendulum'}, ...]} +``` + +### 案例 2: 使用混合模型的 SearchGraph +我们使用 **Groq** 作为 LLM,使用 **Ollama** 作为嵌入模型。 ```python from scrapegraphai.graphs import SearchGraph @@ -105,7 +108,7 @@ graph_config = { # 创建 SearchGraph 实例 search_graph = SearchGraph( - prompt="列出所有来自基奥贾的传统食谱", + prompt="List me all the traditional recipes from Chioggia", config=graph_config ) @@ -118,9 +121,12 @@ print(result) ```python {'recipes': [{'name': 'Sarde in Saòre'}, {'name': 'Bigoli in salsa'}, {'name': 'Seppie in umido'}, {'name': 'Moleche frite'}, {'name': 'Risotto alla pescatora'}, {'name': 'Broeto'}, {'name': 'Bibarasse in Cassopipa'}, {'name': 'Risi e bisi'}, {'name': 'Smegiassa Ciosota'}]} -案例 3: 使用 OpenAI 的 SpeechGraph -您只需传递 OpenAI API 密钥和模型名称。 ``` + +### 案例 3: 使用 OpenAI 的 SpeechGraph + +您只需传递 OpenAI API 密钥和模型名称。 + ```python from scrapegraphai.graphs import SpeechGraph @@ -142,7 +148,7 @@ graph_config = { # ************************************************ speech_graph = SpeechGraph( - prompt="详细总结这些项目并生成音频。", + prompt="Make a detailed audio summary of the projects.", source="https://perinim.github.io/projects/", config=graph_config, ) @@ -152,36 +158,38 @@ print(result) ``` 输出将是一个包含页面上项目摘要的音频文件。 -## 🤝 贡献 +## 赞助商 -欢迎贡献并加入我们的 Discord 服务器与我们讨论改进和提出建议! +
+ + SerpAPI + + + Stats + +
-请参阅贡献指南。 +## 🤝 贡献 +欢迎贡献并加入我们的 Discord 服务器与我们讨论改进和提出建议! +请参阅[贡献指南](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md)。 +[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/uJN7TYcpNa) +[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/) +[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai) -📈 路线图 +## 📈 路线图 -查看项目路线图这里! 🚀 +在[这里](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/README.md)!查看项目路线图! 🚀 -想要以更互动的方式可视化路线图?请查看 markmap 通过将 markdown 内容复制粘贴到编辑器中进行可视化! +想要以更互动的方式可视化路线图?请查看 [markmap](https://markmap.js.org/repl) 通过将 markdown 内容复制粘贴到编辑器中进行可视化! ## ❤️ 贡献者 +[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors) -赞助商 - -
- - SerpAPI - - - Stats - -
- ## 🎓 引用 如果您将我们的库用于研究目的,请引用以下参考文献: @@ -199,16 +207,19 @@ print(result)

Authors_logos

+ ## 联系方式 +| | Contact Info | +|--------------------|----------------------| +| Marco Vinciguerra | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/marco-vinciguerra-7ba365242/) | +| Marco Perini | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/perinim/) | +| Lorenzo Padoan | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/lorenzo-padoan-4521a2154/) | -Marco Vinciguerra -Marco Perini -Lorenzo Padoan ## 📜 许可证 -ScrapeGraphAI 采用 MIT 许可证。更多信息请查看 LICENSE 文件。 +ScrapeGraphAI 采用 MIT 许可证。更多信息请查看 [LICENSE](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/LICENSE) 文件。 -鸣谢 +## 鸣谢 -我们要感谢所有项目贡献者和开源社区的支持。 -ScrapeGraphAI 仅用于数据探索和研究目的。我们不对任何滥用该库的行为负责。 \ No newline at end of file +- 我们要感谢所有项目贡献者和开源社区的支持。 +- ScrapeGraphAI 仅用于数据探索和研究目的。我们不对任何滥用该库的行为负责。 \ No newline at end of file From 89f40f12bc839fe9acaf12dcac81d5c4ff2d5981 Mon Sep 17 00:00:00 2001 From: SchneeHertz <39257008+SchneeHertz@users.noreply.github.com> Date: Tue, 4 Jun 2024 14:38:33 +0800 Subject: [PATCH 5/6] Update chinese.md --- docs/chinese.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/chinese.md b/docs/chinese.md index 5d5b6cd5..96805855 100644 --- a/docs/chinese.md +++ b/docs/chinese.md @@ -1,5 +1,5 @@ # 🕷️ ScrapeGraphAI: 只需抓取一次 -[![下载](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai) +[![下载量](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai) [![代码检查: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint) [![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml) [![CodeQL](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml) From 12ecc99a6c75b39ecd0f7e147b72a45e880f554d Mon Sep 17 00:00:00 2001 From: SchneeHertz <39257008+SchneeHertz@users.noreply.github.com> Date: Tue, 4 Jun 2024 14:46:22 +0800 Subject: [PATCH 6/6] Update chinese.md --- docs/chinese.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/chinese.md b/docs/chinese.md index 96805855..e998c8bf 100644 --- a/docs/chinese.md +++ b/docs/chinese.md @@ -182,7 +182,7 @@ print(result) ## 📈 路线图 -在[这里](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/README.md)!查看项目路线图! 🚀 +在[这里](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/README.md)查看项目路线图! 🚀 想要以更互动的方式可视化路线图?请查看 [markmap](https://markmap.js.org/repl) 通过将 markdown 内容复制粘贴到编辑器中进行可视化! @@ -222,4 +222,4 @@ ScrapeGraphAI 采用 MIT 许可证。更多信息请查看 [LICENSE](https://git ## 鸣谢 - 我们要感谢所有项目贡献者和开源社区的支持。 -- ScrapeGraphAI 仅用于数据探索和研究目的。我们不对任何滥用该库的行为负责。 \ No newline at end of file +- ScrapeGraphAI 仅用于数据探索和研究目的。我们不对任何滥用该库的行为负责。