Skip to content

Pre/beta #653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 156 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
49ae56c
Added screenshot preparation script for screenshot scraping
Santabot123 Aug 23, 2024
e11f0cd
Added text_detection.py and updated screenshot_preparation.py
Santabot123 Aug 24, 2024
ee8f8b3
fix: add claude3.5 sonnet
VinciGit00 Aug 24, 2024
88e76ce
ci(release): 1.14.1 [skip ci]
semantic-release-bot Aug 24, 2024
86fe5fc
fix: update abstract graph
VinciGit00 Aug 24, 2024
132ee5b
ci(release): 1.15.0-beta.3 [skip ci]
semantic-release-bot Aug 24, 2024
9df4b14
refacttoring of the anthropic example
VinciGit00 Aug 25, 2024
4dea972
Merge pull request #583 from ScrapeGraphAI/anthropic-refactoring
VinciGit00 Aug 25, 2024
e3fbf01
Revert "Anthropic refactoring"
VinciGit00 Aug 25, 2024
869bbd7
Merge pull request #584 from ScrapeGraphAI/revert-583-anthropic-refac…
VinciGit00 Aug 25, 2024
37a4a8a
Merge branch 'main' into anthropic-refactoring
VinciGit00 Aug 25, 2024
b4f8ea4
add __init__.py and docstrings
Santabot123 Aug 26, 2024
0cf7c44
correct the typo and updated select_area_with_ipywidget()
Santabot123 Aug 26, 2024
7e23c3d
correct the typo
Santabot123 Aug 26, 2024
35b994a
fix model_tokens not being used for ollama
jamie-beck Aug 26, 2024
2a602a1
Merge branch 'pre/beta' into jamie-beck-patch-1
VinciGit00 Aug 26, 2024
849ae42
Merge pull request #591 from jamie-beck/jamie-beck-patch-1
VinciGit00 Aug 26, 2024
c1ce9c6
ci(release): 1.15.0-beta.4 [skip ci]
semantic-release-bot Aug 26, 2024
04128e7
fix: abstract graph local model
VinciGit00 Aug 26, 2024
22ab45f
ci(release): 1.15.0-beta.5 [skip ci]
semantic-release-bot Aug 26, 2024
b1d3804
Merge pull request #585 from ScrapeGraphAI/anthropic-refactoring
VinciGit00 Aug 26, 2024
fef5eb0
ci(release): 1.15.0 [skip ci]
semantic-release-bot Aug 26, 2024
f73343f
fix(AbstractGraph): correct and simplify instancing logic
f-aguzzi Aug 27, 2024
f6df9b7
chore(examples): update model names
f-aguzzi Aug 27, 2024
229d74d
test(AbstractGraph): add AbstractGraph tests
f-aguzzi Aug 27, 2024
5c16ee9
fix(docloaders): BrowserBase dynamic import
f-aguzzi Aug 27, 2024
83e71df
fix: set up dynamic imports correctly
f-aguzzi Aug 27, 2024
7789663
fix(BurrBrige): dynamic imports
f-aguzzi Aug 27, 2024
d33b347
Merge pull request #596 from ScrapeGraphAI/594-broserbase-dynamic-import
VinciGit00 Aug 27, 2024
d45064b
Merge pull request #597 from ScrapeGraphAI/593-abstract-graph-fix-rou…
VinciGit00 Aug 27, 2024
050fa3f
ci(release): 1.15.0-beta.6 [skip ci]
semantic-release-bot Aug 27, 2024
cf73883
fix: bug for abstract graph
VinciGit00 Aug 27, 2024
be3f1ec
ci(release): 1.15.0-beta.7 [skip ci]
semantic-release-bot Aug 27, 2024
df70b4f
Update abstract_graph.py
VinciGit00 Aug 27, 2024
4eccc76
Merge branch 'pre/beta' of https://github.com/ScrapeGraphAI/Scrapegra…
VinciGit00 Aug 27, 2024
c0a0e69
remove some comments and image
Santabot123 Aug 27, 2024
bda30a9
swapped failing imports (local Gemini and non-imported Ernie) for lan…
Aug 28, 2024
92bec28
Updated requirements.txt
Santabot123 Aug 28, 2024
aa9e85f
remove some comments and image
Santabot123 Aug 26, 2024
90d7549
updated requirements.txt
Santabot123 Aug 28, 2024
6e9911c
Merge branch 'pre/beta' of https://github.com/Santabot123/Scrapegraph…
Santabot123 Aug 28, 2024
55a7727
Update requirements.txt
Santabot123 Aug 28, 2024
edfe45e
Merge pull request #600 from alexljenkins/bugfix/graph_builder-import…
VinciGit00 Aug 28, 2024
4f120e2
fix(AbstractGraph): model selection bug
f-aguzzi Aug 28, 2024
f7a85c2
fix(models): better DeepSeek and OneApi integration
f-aguzzi Aug 28, 2024
08fa257
Merge pull request #602 from ScrapeGraphAI/593-abstract-graph-fix-rou…
VinciGit00 Aug 28, 2024
dbec550
ci(release): 1.15.0-beta.8 [skip ci]
semantic-release-bot Aug 28, 2024
5f562b8
chore: update README.md
f-aguzzi Aug 28, 2024
8f615ad
feat: add togheterai
VinciGit00 Aug 28, 2024
d29f747
Update script_generator_openai.py
VinciGit00 Aug 28, 2024
bbdd58c
merge: main into pre/beta
f-aguzzi Aug 28, 2024
8f38a6b
ci(release): 1.15.1-beta.1 [skip ci]
semantic-release-bot Aug 28, 2024
6d6b414
Merge pull request #607 from ScrapeGraphAI/pre/beta
f-aguzzi Aug 28, 2024
ceb522f
ci(release): 1.15.1 [skip ci]
semantic-release-bot Aug 28, 2024
34942de
chore(examples): create Together AI examples
f-aguzzi Aug 28, 2024
5f604d1
Merge pull request #605 from ScrapeGraphAI/togheter_ai_integration
f-aguzzi Aug 28, 2024
d7f6036
ci(release): 1.16.0-beta.1 [skip ci]
semantic-release-bot Aug 28, 2024
25d8fd2
Fixes node expression validator error message to make it easier to de…
elijahbenizzy Aug 29, 2024
8f056dd
Merge pull request #611 from DAGWorks-Inc/issue-580
VinciGit00 Aug 29, 2024
a96617d
changed pydantic with langchain_pydantic
VinciGit00 Aug 29, 2024
c348f67
fix: update generate answernode
VinciGit00 Aug 30, 2024
735120d
Merge branch 'screenshot-scraper-fix' into pre/beta
VinciGit00 Aug 30, 2024
405f28e
Merge pull request #606 from Santabot123/pre/beta
VinciGit00 Aug 30, 2024
a0d2113
refactoring of folders
VinciGit00 Aug 30, 2024
388630c
fix: screenshot scraper
VinciGit00 Aug 30, 2024
4f4d091
feat:add deepcopy tool
goasleep Aug 31, 2024
cd07418
fix: deepcopy fail for coping model_instance config
goasleep Aug 31, 2024
36818b1
feat:adjust uncopiable obj raise error and remove memo
goasleep Aug 31, 2024
71b22d4
feat: add deepcopy error
goasleep Aug 31, 2024
a73573d
update version
VinciGit00 Aug 31, 2024
9fd6509
Update pyproject.toml
VinciGit00 Aug 31, 2024
9c2aefa
Merge pull request #612 from ScrapeGraphAI/598-1140+-pydantic-validat…
f-aguzzi Aug 31, 2024
1c37d5d
ci(release): 1.16.0-beta.2 [skip ci]
semantic-release-bot Aug 31, 2024
553527a
fix: fix pydantic object copy
goasleep Sep 1, 2024
360ce1c
fix: pyproject.toml
VinciGit00 Sep 1, 2024
d88730c
ci(release): 1.15.2 [skip ci]
semantic-release-bot Sep 1, 2024
0e0b280
Merge branch 'pre/beta' into temp
VinciGit00 Sep 1, 2024
86f9442
Merge pull request #615 from ScrapeGraphAI/temp
VinciGit00 Sep 1, 2024
afdf524
Merge pull request #613 from goasleep/feature/add_copy_tool
VinciGit00 Sep 1, 2024
886c987
ci(release): 1.16.0-beta.3 [skip ci]
semantic-release-bot Sep 1, 2024
fccf034
ci(release): 1.16.0 [skip ci]
semantic-release-bot Sep 1, 2024
f51b155
add example
VinciGit00 Sep 1, 2024
8422463
feat:expose the search engine params to user
goasleep Sep 2, 2024
a8b0e4a
updated token calculation on parsenode
tm-robinson Sep 2, 2024
3d265a8
change GenerateScraperNode to only use first chunk
tm-robinson Sep 2, 2024
1bcc0bf
Merge pull request #620 from goasleep/feature/export_search_engine
VinciGit00 Sep 2, 2024
ba5c7ad
ci(release): 1.16.0-beta.4 [skip ci]
semantic-release-bot Sep 2, 2024
e741602
Merge branch 'pre/beta' into 543-ScriptCreatorGraph-only-use-first-chunk
VinciGit00 Sep 2, 2024
fd0a902
Merge pull request #619 from tm-robinson/543-ScriptCreatorGraph-only-…
VinciGit00 Sep 2, 2024
13efd4e
ci(release): 1.17.0-beta.1 [skip ci]
semantic-release-bot Sep 2, 2024
ef2db0c
Update pyproject.toml
VinciGit00 Sep 2, 2024
74dfc69
fix(DeepSeek): proper model initialization
f-aguzzi Sep 2, 2024
398b2c5
fix(Ollama): instance model from correct package
f-aguzzi Sep 2, 2024
1e466cd
Merge branch 'pre/beta' into screenshot-scraper-fix
VinciGit00 Sep 2, 2024
3ff69cb
Merge pull request #614 from ScrapeGraphAI/screenshot-scraper-fix
VinciGit00 Sep 2, 2024
89b1f10
Merge pull request #621 from ScrapeGraphAI/609-fix-deepseek-instancing
VinciGit00 Sep 2, 2024
08afc92
ci(release): 1.17.0-beta.2 [skip ci]
semantic-release-bot Sep 2, 2024
66a3b6d
fix: Parse Node scraping link and img urls allowing OmniScraper to work
LorenzoPaleari Sep 2, 2024
57337a0
fix: Removed link_urls and img_ulrs from FetchNode output
LorenzoPaleari Sep 2, 2024
b8ef937
fix(ScreenshotScraper): impose dynamic imports
f-aguzzi Sep 2, 2024
5242166
fix(SmartScraper): pass llm_model to ParseNode
f-aguzzi Sep 2, 2024
aed5452
Merge pull request #624 from ScrapeGraphAI/fix-import-errors
VinciGit00 Sep 2, 2024
fc55418
ci(release): 1.17.0-beta.3 [skip ci]
semantic-release-bot Sep 2, 2024
81af62d
Merge pull request #622 from LorenzoPaleari/pre/beta
VinciGit00 Sep 2, 2024
5e99071
ci(release): 1.17.0-beta.4 [skip ci]
semantic-release-bot Sep 2, 2024
8e74ac5
fix: correctly parsing output when using structured_output
LorenzoPaleari Sep 2, 2024
8442700
Merge pull request #626 from LorenzoPaleari/598-fix-pydantic-errors
VinciGit00 Sep 2, 2024
16ab1bf
ci(release): 1.17.0-beta.5 [skip ci]
semantic-release-bot Sep 2, 2024
52fe441
fix(ScreenShotScraper): static import of optional dependencies
f-aguzzi Sep 4, 2024
e477a44
Merge pull request #631 from ScrapeGraphAI/627-PIL-import-error
VinciGit00 Sep 4, 2024
50c9c6b
ci(release): 1.17.0-beta.6 [skip ci]
semantic-release-bot Sep 4, 2024
bd4b26d
feat: ConcatNode.py added for heavy merge operations
ekinsenler Sep 4, 2024
f83c3d1
add example for gemini
ekinsenler Sep 4, 2024
c0339d9
fix file name
ekinsenler Sep 4, 2024
63a5d18
fix(AbstractGraph): Bedrock init issues
f-aguzzi Sep 5, 2024
31aff6b
Merge pull request #636 from ScrapeGraphAI/633-bedrock-support-fix
VinciGit00 Sep 5, 2024
4347afb
ci(release): 1.17.0-beta.7 [skip ci]
semantic-release-bot Sep 5, 2024
2859fb7
feat(AbstractGraph): add adjustable rate limit
f-aguzzi Sep 5, 2024
c382b9d
Merge pull request #630 from ScrapeGraphAI/595-rate-limit-error
VinciGit00 Sep 6, 2024
85c374e
ci(release): 1.17.0-beta.8 [skip ci]
semantic-release-bot Sep 6, 2024
8b02cb4
Merge pull request #632 from ekinsenler/concat_node
VinciGit00 Sep 6, 2024
77d0fd3
ci(release): 1.17.0-beta.9 [skip ci]
semantic-release-bot Sep 6, 2024
9e9c775
add examples multi concat
VinciGit00 Sep 6, 2024
94e69a0
feat: add scrape_do_integration
VinciGit00 Sep 6, 2024
f5e7a8b
fix of the bug for fetching the code
VinciGit00 Sep 6, 2024
8883bce
asdd proxy integratrion
VinciGit00 Sep 6, 2024
167f970
feat: fetch_node improved
VinciGit00 Sep 7, 2024
cb05f82
Update abstract_graph.py
zuanzuanshao Sep 7, 2024
afb6eb7
feat: return urls in searchgraph
VinciGit00 Sep 7, 2024
429af8e
Merge pull request #640 from zuanzuanshao/main
VinciGit00 Sep 7, 2024
ef7a589
fix: screenshot_scraper
VinciGit00 Sep 7, 2024
af28885
ci(release): 1.17.0-beta.10 [skip ci]
semantic-release-bot Sep 7, 2024
9016bb5
Merge pull request #639 from ScrapeGraphAI/scrape_do_integration
f-aguzzi Sep 7, 2024
a73fec5
ci(release): 1.17.0-beta.11 [skip ci]
semantic-release-bot Sep 7, 2024
a540139
docs(sponsor): 🅱️ Browserbase sponsor 🅱️
PeriniM Sep 7, 2024
57fd01f
feat(docloaders): Enhance browser_base_fetch function flexibility
tuhinmallick Sep 7, 2024
7d39019
Merge pull request #642 from tuhinmallick/patch-1
VinciGit00 Sep 8, 2024
cd4ffd7
ci(release): 1.17.0 [skip ci]
semantic-release-bot Sep 8, 2024
d56253d
feat(browser_base_fetch): add async_mode to support both synchronous …
tuhinmallick Sep 8, 2024
02eed1a
Merge branch 'main' into main
VinciGit00 Sep 8, 2024
e9a74e1
Merge pull request #644 from tuhinmallick/main
VinciGit00 Sep 8, 2024
29ef63d
ci(release): 1.18.0 [skip ci]
semantic-release-bot Sep 8, 2024
007ff08
fix(browser_base_fetch): correct function signature and async_mode ha…
tuhinmallick Sep 8, 2024
5f09b1f
Merge pull request #645 from tuhinmallick/main
VinciGit00 Sep 8, 2024
c5ffdef
ci(release): 1.18.1 [skip ci]
semantic-release-bot Sep 8, 2024
fc738ca
Update parse_node.py
VinciGit00 Sep 8, 2024
14c5e6b
Merge branch 'pre/beta' into temp
VinciGit00 Sep 8, 2024
9f52602
Merge pull request #646 from ScrapeGraphAI/temp
VinciGit00 Sep 8, 2024
eddcb79
ci(release): 1.19.0-beta.1 [skip ci]
semantic-release-bot Sep 8, 2024
f2bb22d
fix: temporary fix for parse_node
VinciGit00 Sep 9, 2024
8a0d46b
Merge pull request #641 from ScrapeGraphAI/urls_search_graph
f-aguzzi Sep 9, 2024
32a102a
Merge pull request #648 from ScrapeGraphAI/637-it-can´t-scrape-urls-f…
f-aguzzi Sep 9, 2024
23a260c
ci(release): 1.19.0-beta.2 [skip ci]
semantic-release-bot Sep 9, 2024
947ebd2
fix: parse node
VinciGit00 Sep 10, 2024
4c14fd7
Merge pull request #650 from ScrapeGraphAI/637-it-can´t-scrape-urls-f…
f-aguzzi Sep 10, 2024
38cba96
ci(release): 1.19.0-beta.3 [skip ci]
semantic-release-bot Sep 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
361 changes: 355 additions & 6 deletions CHANGELOG.md

Large diffs are not rendered by default.

39 changes: 27 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,26 +32,38 @@ playwright install

**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

By the way if you to use not mandatory modules it is necessary to install by yourself with the following command:
<details>
<summary><b>Optional Dependencies</b></summary>
Additional dependecies can be added while installing the library:

### Installing "Other Language Models"
- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.

This group allows you to use additional language models like Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.

This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
```bash
pip install scrapegraphai[other-language-models]

```
### Installing "More Semantic Options"
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.

```bash
pip install scrapegraphai[more-semantic-options]
```

- <b>Browsers Options</b>: this group includes additional browser management tools/services, such as Browserbase.

```bash
pip install scrapegraphai[more-browser-options]
```

</details>



This group includes tools for advanced semantic processing, such as Graphviz.
```bash
pip install scrapegraphai[more-semantic-options]
```
### Installing "More Browser Options"

This group includes additional browser management options, such as BrowserBase.
This group includes an ocr scraper for websites
```bash
pip install scrapegraphai[more-browser-options]
pip install scrapegraphai[screenshot_scraper]
```

## 💻 Usage
Expand All @@ -68,7 +80,7 @@ from scrapegraphai.graphs import SmartScraperGraph
graph_config = {
"llm": {
"api_key": "YOUR_OPENAI_APIKEY",
"model": "gpt-4o-mini",
"model": "openai/gpt-4o-mini",
},
"verbose": True,
"headless": False,
Expand Down Expand Up @@ -128,6 +140,9 @@ Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).

## 🏆 Sponsors
<div style="text-align: center;">
<a href="https://2ly.link/1zaXG">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/browserbase_logo.png" alt="Browserbase" style="width: 10%;">
</a>
<a href="https://2ly.link/1zNiz">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
</a>
Expand Down
Binary file added docs/assets/browserbase_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions docs/source/introduction/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ FAQ
Sponsors
========

.. image:: ../../assets/browserbase_logo.png
:width: 10%
:alt: Browserbase
:target: https://www.browserbase.com/

.. image:: ../../assets/serp_api_logo.png
:width: 10%
:alt: Serp API
Expand Down
4 changes: 2 additions & 2 deletions examples/anthropic/csv_scraper_graph_multi_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/csv_scraper_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
26 changes: 6 additions & 20 deletions examples/anthropic/custom_graph_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@
import os
from dotenv import load_dotenv

from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from scrapegraphai.graphs import BaseGraph
from scrapegraphai.nodes import FetchNode, ParseNode, RAGNode, GenerateAnswerNode, RobotsNode
from scrapegraphai.nodes import FetchNode, ParseNode, GenerateAnswerNode, RobotsNode
load_dotenv()

# ************************************************
Expand All @@ -19,16 +18,14 @@
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
},
}

# ************************************************
# Define the graph nodes
# ************************************************

llm_model = OpenAI(graph_config["llm"])
embedder = OpenAIEmbeddings(api_key=llm_model.openai_api_key)
llm_model = ChatAnthropic(graph_config["llm"])

# define the nodes for the graph
robot_node = RobotsNode(
Expand All @@ -43,7 +40,7 @@

fetch_node = FetchNode(
input="url | local_dir",
output=["doc", "link_urls", "img_urls"],
output=["doc"],
node_config={
"verbose": True,
"headless": True,
Expand All @@ -57,15 +54,6 @@
"verbose": True,
}
)
rag_node = RAGNode(
input="user_prompt & (parsed_doc | doc)",
output=["relevant_chunks"],
node_config={
"llm_model": llm_model,
"embedder_model": embedder,
"verbose": True,
}
)
generate_answer_node = GenerateAnswerNode(
input="user_prompt & (relevant_chunks | parsed_doc | doc)",
output=["answer"],
Expand All @@ -84,14 +72,12 @@
robot_node,
fetch_node,
parse_node,
rag_node,
generate_answer_node,
],
edges=[
(robot_node, fetch_node),
(fetch_node, parse_node),
(parse_node, rag_node),
(rag_node, generate_answer_node)
(parse_node, generate_answer_node)
],
entry_point=robot_node
)
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/json_scraper_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/json_scraper_multi_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

FILE_NAME = "inputs/example.json"
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/pdf_scraper_graph_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

source = """
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/pdf_scraper_multi_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ***************
Expand Down
48 changes: 48 additions & 0 deletions examples/anthropic/rate_limit_haiku.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
"""
Basic example of scraping pipeline using SmartScraper while setting an API rate limit.
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info


# required environment variables in .env
# ANTHROPIC_API_KEY
load_dotenv()

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "anthropic/claude-3-haiku-20240307",
"rate_limit": {
"requests_per_second": 1
}
},
}

smart_scraper_graph = SmartScraperGraph(
prompt="""Don't say anything else. Output JSON only. List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
# also accepts a string with the already downloaded HTML code
source="https://www.hmhco.com/event",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
5 changes: 2 additions & 3 deletions examples/anthropic/scrape_plain_text_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/script_generator_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
7 changes: 3 additions & 4 deletions examples/anthropic/script_multi_generator_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,9 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"library": "beautifulsoup"
"model": "anthropic/claude-3-haiku-20240307",
},
"library": "beautifulsoup"
}

# ************************************************
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/search_graph_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
6 changes: 3 additions & 3 deletions examples/anthropic/search_graph_schema_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from langchain_core.pydantic_v1 import BaseModel, Field
from scrapegraphai.graphs import SearchGraph

load_dotenv()
Expand All @@ -27,8 +27,8 @@ class Dishes(BaseModel):
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000},
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
20 changes: 4 additions & 16 deletions examples/anthropic/search_link_graph_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,11 @@

load_dotenv()

llm_model_instance = AzureChatOpenAI(
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"]
)

embedder_model_instance = AzureOpenAIEmbeddings(
azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "anthropic/claude-3-haiku-20240307",
},
}

# ************************************************
Expand Down
5 changes: 2 additions & 3 deletions examples/anthropic/smart_scraper_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,8 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000
},
"model": "anthropic/claude-3-haiku-20240307",
},
}

smart_scraper_graph = SmartScraperGraph(
Expand Down
Loading