Skip to content

Anthropic refactoring #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Aug 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0fbb4e3
docs:add other llm models used
Aug 21, 2024
4a9d849
docs:adjust llm doc explanation
Aug 21, 2024
dd406df
style(models_tokens): enforce formatting
f-aguzzi Aug 21, 2024
b05ec16
fix(models_tokens): add llama2 and llama3 sizes explicitly
f-aguzzi Aug 21, 2024
1b48871
ci(release): 1.14.1-beta.1 [skip ci]
semantic-release-bot Aug 21, 2024
cb3f191
test:add model instance example
goasleep Aug 21, 2024
a92b9c6
Fix: Azure OpenAI issue
aziz-ullah-khan Aug 21, 2024
a8ca1fc
test:add moonshot example
goasleep Aug 21, 2024
117fca0
Merge pull request #571 from aziz-ullah-khan/pre/beta
VinciGit00 Aug 21, 2024
437e48f
Merge pull request #567 from goasleep/feature/add_model_instance_info
VinciGit00 Aug 21, 2024
0d2b7b3
update dependenceis
VinciGit00 Aug 21, 2024
77a6c92
Update pyproject.toml
VinciGit00 Aug 22, 2024
986c8a1
Update README.md
VinciGit00 Aug 22, 2024
f7ba1f3
refactoring of the code
VinciGit00 Aug 23, 2024
26de5dd
Merge branch 'pre/beta' into ligthweigthing_library
VinciGit00 Aug 23, 2024
e0a5e73
Merge pull request #573 from ScrapeGraphAI/ligthweigthing_library
VinciGit00 Aug 23, 2024
62f32e9
feat: ligthweigthing the library
VinciGit00 Aug 23, 2024
06dc640
ci(release): 1.15.0-beta.1 [skip ci]
semantic-release-bot Aug 23, 2024
3f0b5f7
Update README.md
VinciGit00 Aug 23, 2024
214de44
Merge branch 'pre/beta' of https://github.com/ScrapeGraphAI/Scrapegra…
VinciGit00 Aug 23, 2024
b6fb4c0
Merge branch 'pre/beta' into temp
VinciGit00 Aug 23, 2024
20410c9
Merge pull request #577 from ScrapeGraphAI/temp
VinciGit00 Aug 23, 2024
cf1fada
fix: abstract graph
VinciGit00 Aug 23, 2024
ab21576
ci(release): 1.15.0-beta.2 [skip ci]
semantic-release-bot Aug 23, 2024
86fe5fc
fix: update abstract graph
VinciGit00 Aug 24, 2024
132ee5b
ci(release): 1.15.0-beta.3 [skip ci]
semantic-release-bot Aug 24, 2024
9df4b14
refacttoring of the anthropic example
VinciGit00 Aug 25, 2024
37a4a8a
Merge branch 'main' into anthropic-refactoring
VinciGit00 Aug 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 31 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,44 @@
## [1.14.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.0...v1.14.1) (2024-08-24)
## [1.15.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0-beta.2...v1.15.0-beta.3) (2024-08-24)



### Bug Fixes

* update abstract graph ([86fe5fc](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/86fe5fcaf1a6ba28786678874378f07fba1db40f))

## [1.15.0-beta.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0-beta.1...v1.15.0-beta.2) (2024-08-23)


### Bug Fixes

* add claude3.5 sonnet ([ee8f8b3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ee8f8b31ecfe4ffd311528d2f48cb055e4609d99))
* abstract graph ([cf1fada](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cf1fada36a6716cb0e24bbc5da7509446a964145))



### Docs

* added sponsors ([b3a2d0d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b3a2d0d65a41f6e645fac3fc84f702fdf64b951c))

## [1.15.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.1-beta.1...v1.15.0-beta.1) (2024-08-23)


### Features

* ligthweigthing the library ([62f32e9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/62f32e994bcb748dfef4f7e1b2e5213a989c33cc))


### Bug Fixes

* Azure OpenAI issue ([a92b9c6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a92b9c6970049a4ba9dbdf8eff3eeb7f98c6c639))

## [1.14.1-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.0...v1.14.1-beta.1) (2024-08-21)


### Bug Fixes

* **models_tokens:** add llama2 and llama3 sizes explicitly ([b05ec16](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b05ec16b252d00c9c9ee7c6d4605b420851c7754))


## [1.14.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.13.3...v1.14.0) (2024-08-20)


Expand Down
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,28 @@ playwright install

**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

By the way if you to use not mandatory modules it is necessary to install by yourself with the following command:

### Installing "Other Language Models"

This group allows you to use additional language models like Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
```bash
pip install scrapegraphai[other-language-models]

```
### Installing "More Semantic Options"

This group includes tools for advanced semantic processing, such as Graphviz.
```bash
pip install scrapegraphai[more-semantic-options]
```
### Installing "More Browser Options"

This group includes additional browser management options, such as BrowserBase.
```bash
pip install scrapegraphai[more-browser-options]
```

## 💻 Usage
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).

Expand Down
11 changes: 0 additions & 11 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,6 @@ markmap:

## **Short-Term Goals**

- Integration with more llm APIs

- Test proxy rotation implementation

- Add more search engines inside the SearchInternetNode

- Improve the documentation (ReadTheDocs)
- [Issue #102](https://github.com/VinciGit00/Scrapegraph-ai/issues/102)

Expand All @@ -23,9 +17,6 @@ markmap:
## **Medium-Term Goals**

- Node for handling API requests

- Improve SearchGraph to look into the first 5 results of the search engine

- Make scraping more deterministic
- Create DOM tree of the website
- HTML tag text embeddings with tags metadata
Expand Down Expand Up @@ -70,5 +61,3 @@ markmap:
- Automatic generation of scraping pipelines from a given prompt

- Create API for the library

- Finetune a LLM for html content
32 changes: 32 additions & 0 deletions docs/source/scrapers/llm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,35 @@ We can also pass a model instance for the chat model and the embedding model. Fo
"model_instance": embedder_model_instance
}
}

Other LLM models
^^^^^^^^^^^^^^^^

We can also pass a model instance for the chat model and the embedding model through the **model_instance** parameter.
This feature enables you to utilize a Langchain model instance.
You will discover the model you require within the provided list:

- `chat model list <https://python.langchain.com/v0.2/docs/integrations/chat/#all-chat-models>`_
- `embedding model list <https://python.langchain.com/v0.2/docs/integrations/text_embedding/#all-embedding-models>`_.

For instance, consider **chat model** Moonshot. We can integrate it in the following manner:

.. code-block:: python

from langchain_community.chat_models.moonshot import MoonshotChat

# The configuration parameters are contingent upon the specific model you select
llm_instance_config = {
"model": "moonshot-v1-8k",
"base_url": "https://api.moonshot.cn/v1",
"moonshot_api_key": "MOONSHOT_API_KEY",
}

llm_model_instance = MoonshotChat(**llm_instance_config)
graph_config = {
"llm": {
"model_instance": llm_model_instance,
"model_tokens": 5000
},
}

Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000},
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
"library": "beautifulsoup"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,9 @@ class Dishes(BaseModel):
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000},
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}

# ************************************************
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,11 @@
# ************************************************

graph_config = {
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}

# ************************************************
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ class Projects(BaseModel):
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000},
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"model": "anthropic/claude-3-haiku-20240307",
"max_tokens": 4000},
}

Expand Down
1 change: 1 addition & 0 deletions examples/model_instance/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MOONLIGHT_API_KEY="YOUR MOONLIGHT API KEY"
53 changes: 53 additions & 0 deletions examples/model_instance/smart_scraper_with_model_instace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
Basic example of scraping pipeline using SmartScraper and model_instace
"""

import os, json
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
from langchain_community.chat_models.moonshot import MoonshotChat
from dotenv import load_dotenv
load_dotenv()

# ************************************************
# Define the configuration for the graph
# ************************************************


llm_instance_config = {
"model": "moonshot-v1-8k",
"base_url": "https://api.moonshot.cn/v1",
"moonshot_api_key": os.getenv("MOONLIGHT_API_KEY"),
}


llm_model_instance = MoonshotChat(**llm_instance_config)

graph_config = {
"llm": {
"model_instance": llm_model_instance,
"model_tokens": 10000
},
"verbose": True,
"headless": True,
}

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="List me what does the company do, the name and a contact email.",
source="https://scrapegraphai.com/",
config=graph_config
)

result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
1 change: 1 addition & 0 deletions examples/moonshot/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MOONLIGHT_API_KEY="YOUR MOONLIGHT API KEY"
1 change: 1 addition & 0 deletions examples/moonshot/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This folder offer an example of how to use ScrapeGraph-AI with Moonshot and SmartScraperGraph. More usage examples can refer to openai exapmles.
Loading
Loading