Skip to content

Pre/Beta update #353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 134 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
be16fec
WIP
skrawcz May 10, 2024
d94195f
WIP
skrawcz May 10, 2024
82afa0e
Working smart scraper graph
skrawcz May 10, 2024
0bcb0fb
Merge pull request #210 from skrawcz/burr
VinciGit00 May 10, 2024
f2bb1cc
Fixes LC document deserialization
skrawcz May 11, 2024
20604bd
Merge pull request #218 from skrawcz/burr
VinciGit00 May 11, 2024
e53766b
feat: add logger integration
VinciGit00 May 14, 2024
0589083
refactoring of loggers
VinciGit00 May 15, 2024
a4700bf
add robot node
VinciGit00 May 15, 2024
0b5cdd4
Merge pull request #246 from VinciGit00/main
VinciGit00 May 15, 2024
29d284e
Merge branch 'main' into logger-integration
VinciGit00 May 15, 2024
40260d8
remove asdt
VinciGit00 May 15, 2024
4fe58d9
fix logger
VinciGit00 May 15, 2024
befa48c
update lock
VinciGit00 May 15, 2024
6cbd84f
feat(burr-bridge): BurrBridge class to integrate inside BaseGraph
PeriniM May 21, 2024
d96840f
Updates Burr bridge to use class-based API
elijahbenizzy May 21, 2024
cfaf7ee
Merge pull request #284 from DAGWorks-Inc/burr_integration
PeriniM May 21, 2024
654a042
feat(burr-node): working burr bridge
PeriniM May 21, 2024
ac10128
feat(burr): added burr integration in graphs and optional burr instal…
PeriniM May 22, 2024
b377467
add info
VinciGit00 May 23, 2024
d00cde6
fix(pdf_scraper): fix the pdf scraper gaph
VinciGit00 May 23, 2024
5fd7633
Update pdf_scraper_graph.py
VinciGit00 May 23, 2024
d139480
fix(logging): source code citation
DiTo97 May 23, 2024
0790ecd
fix(web-loader): use sublogger
DiTo97 May 23, 2024
c807695
feat(verbose): centralized graph logging on debug or warning dependin…
DiTo97 May 23, 2024
4348d4f
fix(logger): set up centralized root logger in base node
DiTo97 May 23, 2024
c251cc4
fix(node-logging): use centralized logger in each node for logging
DiTo97 May 23, 2024
3d0f671
Merge pull request #294 from DiTo97/logger-integration
VinciGit00 May 24, 2024
b913b51
Merge branch 'logger-integration' into pre/beta
VinciGit00 May 24, 2024
e1006f3
ci(release): 1.5.0-beta.1 [skip ci]
semantic-release-bot May 24, 2024
b6f1766
add OneAPI integration
VinciGit00 May 24, 2024
819f071
docs(burr): added dependecies and switched to furo
PeriniM May 24, 2024
8d5eb0b
fix(local_file): fixed textual input pdf, csv, json and xml graph
PeriniM May 24, 2024
a4ee757
Merge branch 'pre/beta' into pdf_scraper_refactoring
PeriniM May 24, 2024
8b032a9
Merge pull request #293 from VinciGit00/pdf_scraper_refactoring
PeriniM May 24, 2024
edf221d
ci(release): 1.5.0-beta.2 [skip ci]
semantic-release-bot May 24, 2024
5684578
fix(kg): removed unused nodes and utils
PeriniM May 24, 2024
90d5691
ci(release): 1.5.0-beta.3 [skip ci]
semantic-release-bot May 24, 2024
d27cad5
docs(graph): added new graphs and schema
PeriniM May 24, 2024
e65faca
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
PeriniM May 24, 2024
19b27bb
feat(burr): first burr integration and docs
PeriniM May 24, 2024
f9f6b08
Merge branch 'pre/beta' into burr_integration
PeriniM May 25, 2024
7848060
Merge pull request #299 from VinciGit00/burr_integration
PeriniM May 25, 2024
15b7682
ci(release): 1.5.0-beta.4 [skip ci]
semantic-release-bot May 25, 2024
545374c
docs(faq): added faq section and refined installation
PeriniM May 25, 2024
e43b801
docs: updated requirements
PeriniM May 25, 2024
5fb9115
feat(version): python 3.12 is now supported 🚀
PeriniM May 26, 2024
1f51147
ci(release): 1.5.0-beta.5 [skip ci]
semantic-release-bot May 26, 2024
2526831
Merge pull request #302 from VinciGit00/pre/beta
PeriniM May 26, 2024
8296236
ci(release): 1.5.0 [skip ci]
semantic-release-bot May 26, 2024
8d76c4b
fix(schema): added schema
PeriniM May 26, 2024
a22be47
add example
VinciGit00 May 26, 2024
40a99fa
Update pdf_scraper_ollama.py
VinciGit00 May 26, 2024
ecd98b2
add sche,a example
VinciGit00 May 26, 2024
1d958be
Merge pull request #303 from VinciGit00/295-scrapegraph-ai接入oneapi模型q…
VinciGit00 May 26, 2024
fb74a52
update one_api example with schema
VinciGit00 May 26, 2024
a796169
fix(pdf-example): added pdf example and coauthor
arsaboo May 26, 2024
3c7dedf
Merge pull request #305 from VinciGit00/pdf_fix
VinciGit00 May 26, 2024
7f24dd4
ci(release): 1.5.1 [skip ci]
semantic-release-bot May 26, 2024
8f2c8d5
Fix: Update __init__.py
VinciGit00 May 26, 2024
54e8216
fix: fixed typo
PeriniM May 26, 2024
7f4a6a6
ci(release): 1.5.2 [skip ci]
semantic-release-bot May 26, 2024
f4a253b
removed unused file
VinciGit00 May 27, 2024
004d03a
add examples
VinciGit00 May 27, 2024
ac3fa45
Update README.md
Yuan-ManX May 28, 2024
eb841c8
Merge pull request #310 from Yuan-ManX/README
VinciGit00 May 28, 2024
58dfe9b
add examples of usage
VinciGit00 May 28, 2024
9f73d7a
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 28, 2024
3b90ebd
add new examples
VinciGit00 May 29, 2024
287e17a
Update README.md
VinciGit00 May 29, 2024
b553602
Merge pull request #314 from VinciGit00/main
VinciGit00 May 29, 2024
4fcb990
fix: oneapi model
VinciGit00 May 29, 2024
6ea1d2c
ci(release): 1.5.3-beta.1 [skip ci]
semantic-release-bot May 29, 2024
1aa8c86
removed unused file
VinciGit00 May 29, 2024
4639f0c
fix: typo in prompt
May 30, 2024
e734830
Merge pull request #319 from jmfk/pre/beta
PeriniM May 30, 2024
b57bcef
ci(release): 1.5.3-beta.2 [skip ci]
semantic-release-bot May 30, 2024
cdba5ef
Create chinese.md
VinciGit00 May 30, 2024
6d1d91a
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 May 30, 2024
1adcab4
add chinese file
VinciGit00 May 30, 2024
c4ce361
fix: typo in generate_screper_node
VinciGit00 May 30, 2024
5619bca
ci(release): 1.5.3 [skip ci]
semantic-release-bot May 30, 2024
930f673
feat: removed rag node
VinciGit00 May 31, 2024
8be27ba
fix(3.9): python 3.9 logging fix
PeriniM May 31, 2024
29b79cb
ci(release): 1.5.4 [skip ci]
semantic-release-bot May 31, 2024
25352a5
Merge branch 'pre/beta' into temp
VinciGit00 May 31, 2024
25de33e
Merge pull request #320 from VinciGit00/temp
VinciGit00 May 31, 2024
38d138e
ci(release): 1.5.5-beta.1 [skip ci]
semantic-release-bot May 31, 2024
f5cbd80
feat: add pdf scraper multi graph
VinciGit00 Jun 1, 2024
4d42d7b
add example
VinciGit00 Jun 1, 2024
5bda918
feat: add json multiscraper
VinciGit00 Jun 1, 2024
fff1232
add rag node
VinciGit00 Jun 1, 2024
1fe4975
add openai and oneapi examples
VinciGit00 Jun 1, 2024
5cfc101
feat: add forcing format as json
VinciGit00 Jun 2, 2024
1d217e4
ci(release): 1.6.0-beta.1 [skip ci]
semantic-release-bot Jun 2, 2024
fa9722d
add examples
VinciGit00 Jun 2, 2024
40bc77d
Update requirements.txt
seyf97 Jun 2, 2024
9992f8c
Merge pull request #325 from seyf97/patch-2
VinciGit00 Jun 2, 2024
b408655
feat: add csv scraper and xml scraper multi
VinciGit00 Jun 2, 2024
743dfe1
add all possible examples
VinciGit00 Jun 3, 2024
79ace11
Merge pull request #323 from VinciGit00/refactoring-pdf_scraper
PeriniM Jun 3, 2024
ed1dc0b
ci(release): 1.6.0-beta.2 [skip ci]
semantic-release-bot Jun 3, 2024
08499c2
Update README.md
VinciGit00 Jun 3, 2024
1dde43c
add new examples
VinciGit00 Jun 3, 2024
8de720d
feat: removed a bug
VinciGit00 Jun 3, 2024
b70cb37
ci(release): 1.6.0-beta.3 [skip ci]
semantic-release-bot Jun 3, 2024
c8d556d
feat: fix an if
VinciGit00 Jun 3, 2024
f36dd8b
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 3, 2024
08a14ef
ci(release): 1.6.0-beta.4 [skip ci]
semantic-release-bot Jun 3, 2024
8a52e13
Update README.md
jiangyuan-li Jun 4, 2024
c4bf325
Improve the Chinese Readme to synchronize with the English Readme.
SchneeHertz Jun 4, 2024
89f40f1
Update chinese.md
SchneeHertz Jun 4, 2024
12ecc99
Update chinese.md
SchneeHertz Jun 4, 2024
3141ac8
Merge pull request #336 from SchneeHertz/main
VinciGit00 Jun 4, 2024
28d874e
Merge pull request #335 from jiangyuan-li/main
VinciGit00 Jun 4, 2024
55b4865
Merge pull request #338 from VinciGit00/main
VinciGit00 Jun 4, 2024
244aada
feat: refactoring of an in if
VinciGit00 Jun 4, 2024
dde0c7e
ci(release): 1.6.0-beta.5 [skip ci]
semantic-release-bot Jun 4, 2024
acece72
Update cleanup_html.py
seyf97 Jun 4, 2024
4c0d0e9
Merge pull request #339 from seyf97/seyf97-link_extraction_patch
VinciGit00 Jun 4, 2024
f81442b
removed unused if
VinciGit00 Jun 4, 2024
58cd523
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 4, 2024
fff89f4
feat: refactoring of abstract graph
VinciGit00 Jun 4, 2024
ac8e7c1
ci(release): 1.6.0-beta.6 [skip ci]
semantic-release-bot Jun 4, 2024
376f758
feat(pydantic): added pydantic output schema
PeriniM Jun 4, 2024
f8b08e0
feat(append_node): append node to existing graph
PeriniM Jun 4, 2024
74fd530
Merge branch 'pre/beta' into 332-pydantic-schema-validation
VinciGit00 Jun 5, 2024
a7443a7
Merge pull request #341 from VinciGit00/332-pydantic-schema-validation
VinciGit00 Jun 5, 2024
cab5f68
ci(release): 1.6.0-beta.7 [skip ci]
semantic-release-bot Jun 5, 2024
5d20186
feat: add json as output
VinciGit00 Jun 5, 2024
7a6f016
ci(release): 1.6.0-beta.8 [skip ci]
semantic-release-bot Jun 5, 2024
450fde6
add get functions on the dictionary
VinciGit00 Jun 5, 2024
4f53b09
add examples for schema
VinciGit00 Jun 5, 2024
dd2b3a8
add examples
VinciGit00 Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,15 @@ docs/source/_templates/
docs/source/_static/
.env
venv/
.venv/
.vscode/

# exclude pdf, mp3
*.pdf
*.mp3
*.sqlite
*.google-cookie
*.python-version
examples/graph_examples/ScrapeGraphAI_generated_graph
examples/**/result.csv
examples/**/result.json
Expand Down
240 changes: 240 additions & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@

# 🕷️ ScrapeGraphAI: You Only Scrape Once
[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md)

[![Downloads](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai)
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
[![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
Expand All @@ -17,7 +19,7 @@ Just say which information you want to extract and the library will do it for yo

## 🚀 Quick install

The reference page for Scrapegraph-ai is available on the official page of pypy: [pypi](https://pypi.org/project/scrapegraphai/).
The reference page for Scrapegraph-ai is available on the official page of PyPI: [pypi](https://pypi.org/project/scrapegraphai/).

```bash
pip install scrapegraphai
Expand All @@ -28,7 +30,7 @@ pip install scrapegraphai
## 🔍 Demo
Official streamlit demo:

[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-demo.streamlit.app/)
[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)

Try it directly on the web using Google Colab:

Expand Down Expand Up @@ -162,13 +164,23 @@ print(result)

The output will be an audio file with the summary of the projects on the page.

## Sponsors
<div style="text-align: center;">
<a href="https://serpapi.com?utm_source=scrapegraphai">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
</a>
<a href="https://dashboard.statproxies.com/?refferal=scrapegraph">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 15%;">
</a>
</div>

## 🤝 Contributing

Feel free to contribute and join our Discord server to discuss with us improvements and give us suggestions!

Please see the [contributing guidelines](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md).

[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/gkxQDAjfeX)
[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/uJN7TYcpNa)
[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)

Expand All @@ -179,15 +191,6 @@ Wanna visualize the roadmap in a more interactive way? Check out the [markmap](h

## ❤️ Contributors
[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors)
## Sponsors
<div style="text-align: center;">
<a href="https://serpapi.com?utm_source=scrapegraphai">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
</a>
<a href="https://dashboard.statproxies.com/?refferal=scrapegraph">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 10%;">
</a>
</div>

## 🎓 Citations
If you have used our library for research purposes please quote us with the following reference:
Expand Down
225 changes: 225 additions & 0 deletions docs/chinese.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# 🕷️ ScrapeGraphAI: 只需抓取一次
[![下载量](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai)
[![代码检查: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
[![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
[![CodeQL](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
[![许可证: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)

ScrapeGraphAI 是一个*网络爬虫* Python 库,使用大型语言模型和直接图逻辑为网站和本地文档(XML,HTML,JSON 等)创建爬取管道。

只需告诉库您想提取哪些信息,它将为您完成!

<p align="center">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapegraphai_logo.png" alt="Scrapegraph-ai Logo" style="width: 50%;">
</p>

## 🚀 快速安装

Scrapegraph-ai 的参考页面可以在 PyPI 的官方网站上找到: [pypi](https://pypi.org/project/scrapegraphai/)。

```bash
pip install scrapegraphai
```
**注意**: 建议在虚拟环境中安装该库,以避免与其他库发生冲突 🐱

## 🔍 演示

官方 Streamlit 演示:

[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)

在 Google Colab 上直接尝试:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)

## 📖 文档

ScrapeGraphAI 的文档可以在[这里](https://scrapegraph-ai.readthedocs.io/en/latest/)找到。

还可以查看 Docusaurus 的[版本](https://scrapegraph-doc.onrender.com/)。

## 💻 用法

有三种主要的爬取管道可用于从网站(或本地文件)提取信息:

- `SmartScraperGraph`: 单页爬虫,只需用户提示和输入源;
- `SearchGraph`: 多页爬虫,从搜索引擎的前 n 个搜索结果中提取信息;
- `SpeechGraph`: 单页爬虫,从网站提取信息并生成音频文件。
- `SmartScraperMultiGraph`: 多页爬虫,给定一个提示
可以通过 API 使用不同的 LLM,如 **OpenAI**,**Groq**,**Azure** 和 **Gemini**,或者使用 **Ollama** 的本地模型。

### 案例 1: 使用本地模型的 SmartScraper
请确保已安装 [Ollama](https://ollama.com/) 并使用 `ollama pull` 命令下载模型。

``` python
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 0,
"format": "json", # Ollama 需要显式指定格式
"base_url": "http://localhost:11434", # 设置 Ollama URL
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434", # 设置 Ollama URL
},
"verbose": True,
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their descriptions",
# 也接受已下载的 HTML 代码的字符串
source="https://perinim.github.io/projects",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)
```

输出将是一个包含项目及其描述的列表,如下所示:

```python
{'projects': [{'title': 'Rotary Pendulum RL', 'description': 'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'}, {'title': 'DQN Implementation from scratch', 'description': 'Developed a Deep Q-Network algorithm to train a simple and double pendulum'}, ...]}
```

### 案例 2: 使用混合模型的 SearchGraph
我们使用 **Groq** 作为 LLM,使用 **Ollama** 作为嵌入模型。

```python
from scrapegraphai.graphs import SearchGraph

# 定义图的配置
graph_config = {
"llm": {
"model": "groq/gemma-7b-it",
"api_key": "GROQ_API_KEY",
"temperature": 0
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434", # 任意设置 Ollama URL
},
"max_results": 5,
}

# 创建 SearchGraph 实例
search_graph = SearchGraph(
prompt="List me all the traditional recipes from Chioggia",
config=graph_config
)

# 运行图
result = search_graph.run()
print(result)
```

输出将是一个食谱列表,如下所示:

```python
{'recipes': [{'name': 'Sarde in Saòre'}, {'name': 'Bigoli in salsa'}, {'name': 'Seppie in umido'}, {'name': 'Moleche frite'}, {'name': 'Risotto alla pescatora'}, {'name': 'Broeto'}, {'name': 'Bibarasse in Cassopipa'}, {'name': 'Risi e bisi'}, {'name': 'Smegiassa Ciosota'}]}
```

### 案例 3: 使用 OpenAI 的 SpeechGraph

您只需传递 OpenAI API 密钥和模型名称。

```python
from scrapegraphai.graphs import SpeechGraph

graph_config = {
"llm": {
"api_key": "OPENAI_API_KEY",
"model": "gpt-3.5-turbo",
},
"tts_model": {
"api_key": "OPENAI_API_KEY",
"model": "tts-1",
"voice": "alloy"
},
"output_path": "audio_summary.mp3",
}

# ************************************************
# 创建 SpeechGraph 实例并运行
# ************************************************

speech_graph = SpeechGraph(
prompt="Make a detailed audio summary of the projects.",
source="https://perinim.github.io/projects/",
config=graph_config,
)

result = speech_graph.run()
print(result)
```
输出将是一个包含页面上项目摘要的音频文件。

## 赞助商

<div style="text-align: center;">
<a href="https://serpapi.com?utm_source=scrapegraphai">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
</a>
<a href="https://dashboard.statproxies.com/?refferal=scrapegraph">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 15%;">
</a>
</div>

## 🤝 贡献

欢迎贡献并加入我们的 Discord 服务器与我们讨论改进和提出建议!

请参阅[贡献指南](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md)。

[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/uJN7TYcpNa)
[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)


## 📈 路线图

在[这里](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/README.md)查看项目路线图! 🚀

想要以更互动的方式可视化路线图?请查看 [markmap](https://markmap.js.org/repl) 通过将 markdown 内容复制粘贴到编辑器中进行可视化!

## ❤️ 贡献者
[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors)


## 🎓 引用

如果您将我们的库用于研究目的,请引用以下参考文献:
```text
@misc{scrapegraph-ai,
author = {Marco Perini, Lorenzo Padoan, Marco Vinciguerra},
title = {Scrapegraph-ai},
year = {2024},
url = {https://github.com/VinciGit00/Scrapegraph-ai},
note = {一个利用大型语言模型进行爬取的 Python 库}
}
```
## 作者

<p align="center">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/logo_authors.png" alt="Authors_logos">
</p>

## 联系方式
| | Contact Info |
|--------------------|----------------------|
| Marco Vinciguerra | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/marco-vinciguerra-7ba365242/) |
| Marco Perini | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/perinim/) |
| Lorenzo Padoan | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/lorenzo-padoan-4521a2154/) |

## 📜 许可证

ScrapeGraphAI 采用 MIT 许可证。更多信息请查看 [LICENSE](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/LICENSE) 文件。

## 鸣谢

- 我们要感谢所有项目贡献者和开源社区的支持。
- ScrapeGraphAI 仅用于数据探索和研究目的。我们不对任何滥用该库的行为负责。
24 changes: 7 additions & 17 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,27 +23,17 @@
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']

templates_path = ['_templates']
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

# html_theme = 'sphinx_rtd_theme'
html_theme = 'sphinx_wagtail_theme'

html_theme_options = dict(
project_name = "ScrapeGraphAI",
logo = "scrapegraphai_logo.png",
logo_alt = "ScrapeGraphAI",
logo_height = 59,
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
logo_width = 45,
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
footer_links = ",".join(
["Landing Page|https://scrapegraphai.com/",
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
),
)
html_theme = 'furo'
html_theme_options = {
"source_repository": "https://github.com/VinciGit00/Scrapegraph-ai/",
"source_branch": "main",
"source_directory": "docs/source/",
}
11 changes: 9 additions & 2 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,18 @@ The library is available on PyPI, so it can be installed using the following com

It is higly recommended to install the library in a virtual environment (conda, venv, etc.)

If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
If your clone the repository, it is recommended to use a package manager like `rye <https://rye.astral.sh/>`_.
To install the library using rye, you can run the following command:

.. code-block:: bash

poetry install
rye pin 3.10
rye sync
rye build

.. caution::

**Rye** must be installed first by following the instructions on the `official website <https://rye.astral.sh/>`_.

Additionally on Windows when using WSL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
9 changes: 9 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,15 @@

modules/modules

.. toctree::
:hidden:
:caption: EXTERNAL RESOURCES

GitHub <https://github.com/VinciGit00/Scrapegraph-ai>
Discord <https://discord.gg/uJN7TYcpNa>
Linkedin <https://www.linkedin.com/company/scrapegraphai/>
Twitter <https://twitter.com/scrapegraphai>

Indices and tables
==================

Expand Down
Loading
Loading