ScrapeGraphAI · VinciGit00 · Jun 5, 2024 · Jun 5, 2024 · Jun 5, 2024 · Jun 5, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,43 @@
+## [1.6.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.5.7...v1.6.0) (2024-06-09)
+
+
+### Features
+
+* Add tests for RobotsNode and update test setup ([dedfa2e](https://github.com/VinciGit00/Scrapegraph-ai/commit/dedfa2eaf02b7e9b68a116515053c1daae6e4a31))
+
+
+### Test
+
+* Enhance JSON scraping pipeline test ([d845a1b](https://github.com/VinciGit00/Scrapegraph-ai/commit/d845a1ba7d6e7f7574b92b51b6d5326bbfb3d1c6))
+
+## [1.5.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.5.6...v1.5.7) (2024-06-06)
+
+
+### Bug Fixes
+
+* update openai tts class ([10672d6](https://github.com/VinciGit00/Scrapegraph-ai/commit/10672d6ebb06d950bbf8b66cc9a2d420c183013d))
+
+## [1.5.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.5.5...v1.5.6) (2024-06-05)
+
+
+### Bug Fixes
+
+* getter ([67d83cf](https://github.com/VinciGit00/Scrapegraph-ai/commit/67d83cff46d8ea606b8972c364ab4c56e6fa4fe4))
+
+## [1.5.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.5.4...v1.5.5) (2024-06-05)
+
+
+### Bug Fixes
+
+* bug on generate_answer_node ([1d38ed1](https://github.com/VinciGit00/Scrapegraph-ai/commit/1d38ed146afae95dae1f35ac51180a1882bf8a29))
+
+
+### Docs
+
+* add Japanese README ([4559ab6](https://github.com/VinciGit00/Scrapegraph-ai/commit/4559ab6db845a0d94371a09d0ed1e1623eed9ee2))
+* update japanese.md ([f0042a8](https://github.com/VinciGit00/Scrapegraph-ai/commit/f0042a8e33f8fb8b113681ee0a9995d329bb0faa))
+* update README.md ([871e398](https://github.com/VinciGit00/Scrapegraph-ai/commit/871e398a26786d264dbd1b2743864ed2cc12b3da))
+
 ## [1.5.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.5.3...v1.5.4) (2024-05-31)
 
 

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 
 # 🕷️ ScrapeGraphAI: You Only Scrape Once
-[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md)
+[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) | [日本語](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/japanese.md)
 
 [![Downloads](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai)
 [![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)

diff --git a/docs/chinese.md b/docs/chinese.md
@@ -1,9 +1,9 @@
 # 🕷️ ScrapeGraphAI: 只需抓取一次
-[![下载量](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai)
-[![代码检查: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
-[![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
-[![CodeQL](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
-[![许可证: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Downloads](https://img.shields.io/pepy/dt/scrapegraphai?style=for-the-badge)](https://pepy.tech/project/scrapegraphai)
+[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen?style=for-the-badge)](https://github.com/pylint-dev/pylint)
+[![Pylint](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/pylint.yml?style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
+[![CodeQL](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/codeql.yml?style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
 [![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)
 
 ScrapeGraphAI 是一个*网络爬虫* Python 库，使用大型语言模型和直接图逻辑为网站和本地文档（XML，HTML，JSON 等）创建爬取管道。

diff --git a/docs/japanese.md b/docs/japanese.md
@@ -0,0 +1,225 @@
+# 🕷️ ScrapeGraphAI: 一度のクロールで完結
+[![Downloads](https://img.shields.io/pepy/dt/scrapegraphai?style=for-the-badge)](https://pepy.tech/project/scrapegraphai)
+[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen?style=for-the-badge)](https://github.com/pylint-dev/pylint)
+[![Pylint](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/pylint.yml?style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
+[![CodeQL](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/codeql.yml?style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
+[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)
+
+ScrapeGraphAIは、大規模言語モデルと直接グラフロジックを使用して、ウェブサイトやローカルドキュメント（XML、HTML、JSONなど）のクローリングパイプラインを作成するPythonライブラリです。
+
+クロールしたい情報をライブラリに伝えるだけで、残りはすべてライブラリが行います！
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapegraphai_logo.png" alt="Scrapegraph-ai Logo" style="width: 50%;">
+</p>
+
+## 🚀 インストール方法
+
+Scrapegraph-aiの参照ページはPyPIの公式サイトで見ることができます: [pypi](https://pypi.org/project/scrapegraphai/)。
+
+```bash
+pip install scrapegraphai
+```
+**注意**: 他のライブラリとの競合を避けるため、このライブラリは仮想環境でのインストールを推奨します 🐱
+
+## 🔍 デモ
+
+公式のStreamlitデモ：
+
+[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)
+
+Google Colabで直接試す：
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)
+
+## 📖 ドキュメント
+
+ScrapeGraphAIのドキュメントは[こちら](https://scrapegraph-ai.readthedocs.io/en/latest/)で見ることができます。
+
+Docusaurusの[バージョン](https://scrapegraph-doc.onrender.com/)もご覧ください。
+
+## 💻 使い方
+
+ウェブサイト（またはローカルファイル）から情報を抽出するための3つの主要なクローリングパイプラインがあります：
+
+- `SmartScraperGraph`: 単一ページのクローラー。ユーザープロンプトと入力ソースのみが必要です。
+- `SearchGraph`: 複数ページのクローラー。検索エンジンの上位n個の検索結果から情報を抽出します。
+- `SpeechGraph`: 単一ページのクローラー。ウェブサイトから情報を抽出し、音声ファイルを生成します。
+- `SmartScraperMultiGraph`: 複数ページのクローラー。プロンプトを与えると、
+**OpenAI**、**Groq**、**Azure**、**Gemini**などの異なるLLMをAPI経由で使用することができます。また、**Ollama**のローカルモデルを使用することもできます。
+
+### 例 1: ローカルモデルを使用したSmartScraper
+[Ollama](https://ollama.com/)がインストールされていること、および`ollama pull`コマンドでモデルがダウンロードされていることを確認してください。
+
+``` python
+from scrapegraphai.graphs import SmartScraperGraph
+
+graph_config = {
+    "llm": {
+        "model": "ollama/mistral",
+        "temperature": 0,
+        "format": "json",  # Ollamaではフォーマットを明示的に指定する必要があります
+        "base_url": "http://localhost:11434",  # OllamaのURLを設定
+    },
+    "embeddings": {
+        "model": "ollama/nomic-embed-text",
+        "base_url": "http://localhost:11434",  # OllamaのURLを設定
+    },
+    "verbose": True,
+}
+
+smart_scraper_graph = SmartScraperGraph(
+    prompt="すべてのプロジェクトとその説明をリストしてください",
+    # ダウンロード済みのHTMLコードの文字列も受け付けます
+    source="https://perinim.github.io/projects",
+    config=graph_config
+)
+
+result = smart_scraper_graph.run()
+print(result)
+```
+
+出力は、プロジェクトとその説明のリストになります：
+
+```python
+{'projects': [{'title': 'Rotary Pendulum RL', 'description': 'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'}, {'title': 'DQN Implementation from scratch', 'description': 'Developed a Deep Q-Network algorithm to train a simple and double pendulum'}, ...]}
+```
+
+### 例 2: 混合モデルを使用したSearchGraph
+**Groq**をLLMとして、**Ollama**を埋め込みモデルとして使用します。
+
+```python
+from scrapegraphai.graphs import SearchGraph
+
+# グラフの設定を定義
+graph_config = {
+    "llm": {
+        "model": "groq/gemma-7b-it",
+        "api_key": "GROQ_API_KEY",
+        "temperature": 0
+    },
+    "embeddings": {
+        "model": "ollama/nomic-embed-text",
+        "base_url": "http://localhost:11434",  # OllamaのURLを任意に設定
+    },
+    "max_results": 5,
+}
+
+# SearchGraphインスタンスを作成
+search_graph = SearchGraph(
+    prompt="Chioggiaの伝統的なレシピをすべてリストしてください",
+    config=graph_config
+)
+
+# グラフを実行
+result = search_graph.run()
+print(result)
+```
+
+出力は、レシピのリストになります：
+
+```python
+{'recipes': [{'name': 'Sarde in Saòre'}, {'name': 'Bigoli in salsa'}, {'name': 'Seppie in umido'}, {'name': 'Moleche frite'}, {'name': 'Risotto alla pescatora'}, {'name': 'Broeto'}, {'name': 'Bibarasse in Cassopipa'}, {'name': 'Risi e bisi'}, {'name': 'Smegiassa Ciosota'}]}
+```
+
+### 例 3: OpenAIを使用したSpeechGraph
+
+OpenAI APIキーとモデル名を渡すだけです。
+
+```python
+from scrapegraphai.graphs import SpeechGraph
+
+graph_config = {
+    "llm": {
+        "api_key": "OPENAI_API_KEY",
+        "model": "gpt-3.5-turbo",
+    },
+    "tts_model": {
+        "api_key": "OPENAI_API_KEY",
+        "model": "tts-1",
+        "voice": "alloy"
+    },
+    "output_path": "audio_summary.mp3",
+}
+
+# ************************************************
+# SpeechGraphインスタンスを作成して実行
+# ************************************************
+
+speech_graph = SpeechGraph(
+    prompt="プロジェクトの詳細な音声要約を作成してください。",
+    source="https://perinim.github.io/projects/",
+    config=graph_config,
+)
+
+result = speech_graph.run()
+print(result)
+```
+出力は、ページ上のプロジェクトの要約を含む音声ファイルになります。
+
+## スポンサー
+
+<div style="text-align: center;">
+  <a href="https://serpapi.com?utm_source=scrapegraphai">
+    <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
+  </a>
+  <a href="https://dashboard.statproxies.com/?refferal=scrapegraph">
+    <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 15%;">
+  </a>
+</div>
+
+## 🤝 貢献
+
+貢献を歓迎し、Discordサーバーで改善や提案について話し合います！
+
+[貢献ガイド](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md)をご覧ください。
+
+[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/uJN7TYcpNa)
+[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
+[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)
+
+
+## 📈 ロードマップ
+
+[こちら](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/README.md)でプロジェクトのロードマップをご覧ください！ 🚀
+
+よりインタラクティブな方法でロードマップを視覚化したいですか？[markmap](https://markmap.js.org/repl)をチェックして、マークダウンの内容をエディタにコピー＆ペーストして視覚化してください！
+
+## ❤️ 貢献者
+[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors)
+
+
+## 🎓 引用
+
+研究目的で当社のライブラリを使用する場合は、以下の参考文献を引用してください：
+```text
+  @misc{scrapegraph-ai,
+    author = {Marco Perini, Lorenzo Padoan, Marco Vinciguerra},
+    title = {Scrapegraph-ai},
+    year = {2024},
+    url = {https://github.com/VinciGit00/Scrapegraph-ai},
+    note = {A Python library for scraping leveraging large language models}
+  }
+```
+## 作者
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/logo_authors.png" alt="Authors_logos">
+</p>
+
+## 連絡先
+|                    | 連絡先         |
+|--------------------|----------------------|
+| Marco Vinciguerra  | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/marco-vinciguerra-7ba365242/)    |
+| Marco Perini       | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/perinim/)   |
+| Lorenzo Padoan     | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/lorenzo-padoan-4521a2154/)  |
+
+## 📜 ライセンス
+
+ScrapeGraphAIはMITライセンスの下で提供されています。詳細は[LICENSE](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/LICENSE)ファイルをご覧ください。
+
+## 謝辞
+
+- プロジェクトの貢献者とオープンソースコミュニティのサポートに感謝します。
+- ScrapeGraphAIはデータ探索と研究目的のみに使用されます。このライブラリの不正使用については一切責任を負いません。
diff --git a/examples/anthropic/pdf_scraper_graph_haiku.py b/examples/anthropic/pdf_scraper_graph_haiku.py
@@ -28,28 +28,10 @@
     the Beatrice of his earlier poetry, through the celestial spheres of Paradise.
 """
 
-schema = """
-    {
-        "type": "object",
-        "properties": {
-            "summary": {
-                "type": "string"
-            },
-            "topics": {
-                "type": "array",
-                "items": {
-                    "type": "string"
-                }
-            }
-        }
-    }
-"""
-
 pdf_scraper_graph = PDFScraperGraph(
     prompt="Summarize the text and find the main topics",
     source=source,
     config=graph_config,
-    schema=schema,
 )
 result = pdf_scraper_graph.run()
 

diff --git a/examples/anthropic/smart_scraper_haiku.py b/examples/anthropic/smart_scraper_haiku.py
@@ -9,7 +9,6 @@
 
 
 # required environment variables in .env
-# HUGGINGFACEHUB_API_TOKEN
 # ANTHROPIC_API_KEY
 load_dotenv()
 

diff --git a/examples/ernie/csv_scraper_ernie.py b/examples/ernie/csv_scraper_ernie.py
@@ -0,0 +1,61 @@
+"""
+Basic example of scraping pipeline using CSVScraperGraph from CSV documents
+"""
+
+import os
+from dotenv import load_dotenv
+import pandas as pd
+from scrapegraphai.graphs import CSVScraperGraph
+from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
+load_dotenv()
+
+# ************************************************
+# Read the CSV file
+# ************************************************
+
+FILE_NAME = "inputs/username.csv"
+curr_dir = os.path.dirname(os.path.realpath(__file__))
+file_path = os.path.join(curr_dir, FILE_NAME)
+
+text = pd.read_csv(file_path)
+
+# ************************************************
+# Define the configuration for the graph
+# ************************************************
+
+graph_config = {  
+    "llm": {
+            "model": "ernie-bot-turbo",
+            "ernie_client_id": "<ernie_client_id>",
+            "ernie_client_secret": "<ernie_client_secret>",
+            "temperature": 0.1
+        },
+        "embeddings": {
+            "model": "ollama/nomic-embed-text",
+            "temperature": 0,
+            "base_url": "http://localhost:11434",}
+    }
+
+# ************************************************
+# Create the CSVScraperGraph instance and run it
+# ************************************************
+
+csv_scraper_graph = CSVScraperGraph(
+    prompt="List me all the last names",
+    source=str(text),  # Pass the content of the file, not the file object
+    config=graph_config
+)
+
+result = csv_scraper_graph.run()
+print(result)
+
+# ************************************************
+# Get graph execution info
+# ************************************************
+
+graph_exec_info = csv_scraper_graph.get_execution_info()
+print(prettify_exec_info(graph_exec_info))
+
+# Save to json or csv
+convert_to_csv(result, "result")
+convert_to_json(result, "result")