ScrapeGraphAI · VinciGit00 · Aug 26, 2024 · Aug 21, 2024 · Aug 21, 2024 · Aug 21, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,15 +1,44 @@
-## [1.14.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.0...v1.14.1) (2024-08-24)
+## [1.15.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0-beta.2...v1.15.0-beta.3) (2024-08-24)
+
+
+
+### Bug Fixes
+
+* update abstract graph ([86fe5fc](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/86fe5fcaf1a6ba28786678874378f07fba1db40f))
+
+## [1.15.0-beta.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0-beta.1...v1.15.0-beta.2) (2024-08-23)
 
 
 ### Bug Fixes
 
-* add claude3.5 sonnet ([ee8f8b3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ee8f8b31ecfe4ffd311528d2f48cb055e4609d99))
+* abstract graph ([cf1fada](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cf1fada36a6716cb0e24bbc5da7509446a964145))
+
 
 
 ### Docs
 
 * added sponsors ([b3a2d0d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b3a2d0d65a41f6e645fac3fc84f702fdf64b951c))
 
+## [1.15.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.1-beta.1...v1.15.0-beta.1) (2024-08-23)
+
+
+### Features
+
+* ligthweigthing the library ([62f32e9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/62f32e994bcb748dfef4f7e1b2e5213a989c33cc))
+
+
+### Bug Fixes
+
+* Azure OpenAI issue ([a92b9c6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a92b9c6970049a4ba9dbdf8eff3eeb7f98c6c639))
+
+## [1.14.1-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.0...v1.14.1-beta.1) (2024-08-21)
+
+
+### Bug Fixes
+
+* **models_tokens:** add llama2 and llama3 sizes explicitly ([b05ec16](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b05ec16b252d00c9c9ee7c6d4605b420851c7754))
+
+
 ## [1.14.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.13.3...v1.14.0) (2024-08-20)
 
 

diff --git a/README.md b/README.md
@@ -32,6 +32,28 @@ playwright install
 
 **Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
 
+By the way if you to use not mandatory modules it is necessary to install by yourself with the following command:
+
+### Installing "Other Language Models"
+
+This group allows you to use additional language models like Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
+```bash
+pip install scrapegraphai[other-language-models]
+
+```
+### Installing "More Semantic Options"
+
+This group includes tools for advanced semantic processing, such as Graphviz.
+```bash
+pip install scrapegraphai[more-semantic-options]
+```
+### Installing "More Browser Options"
+
+This group includes additional browser management options, such as BrowserBase.
+```bash
+pip install scrapegraphai[more-browser-options]
+```
+
 ## 💻 Usage
 There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
 

diff --git a/docs/README.md b/docs/README.md
@@ -9,12 +9,6 @@ markmap:
 
 ## **Short-Term Goals**
 
-- Integration with more llm APIs
-
-- Test proxy rotation implementation
-
-- Add more search engines inside the SearchInternetNode
-
 - Improve the documentation (ReadTheDocs)
     - [Issue #102](https://github.com/VinciGit00/Scrapegraph-ai/issues/102)
 
@@ -23,9 +17,6 @@ markmap:
 ## **Medium-Term Goals**
 
 - Node for handling API requests
-
-- Improve SearchGraph to look into the first 5 results of the search engine
-
 - Make scraping more deterministic
     - Create DOM tree of the website
     - HTML tag text embeddings with tags metadata
@@ -70,5 +61,3 @@ markmap:
 - Automatic generation of scraping pipelines from a given prompt
 
 - Create API for the library
-
-- Finetune a LLM for html content
diff --git a/docs/source/scrapers/llm.rst b/docs/source/scrapers/llm.rst
@@ -194,3 +194,35 @@ We can also pass a model instance for the chat model and the embedding model. Fo
             "model_instance": embedder_model_instance
         }
     }
+
+Other LLM models
+^^^^^^^^^^^^^^^^
+
+We can also pass a model instance for the chat model and the embedding model through the **model_instance** parameter. 
+This feature enables you to utilize a Langchain model instance.
+You will discover the model you require within the provided list:
+
+- `chat model list <https://python.langchain.com/v0.2/docs/integrations/chat/#all-chat-models>`_
+- `embedding model list <https://python.langchain.com/v0.2/docs/integrations/text_embedding/#all-embedding-models>`_.
+
+For instance, consider **chat model** Moonshot. We can integrate it in the following manner:
+
+.. code-block:: python
+
+    from langchain_community.chat_models.moonshot import MoonshotChat
+
+    # The configuration parameters are contingent upon the specific model you select
+    llm_instance_config = {
+        "model": "moonshot-v1-8k",
+        "base_url": "https://api.moonshot.cn/v1",
+        "moonshot_api_key": "MOONSHOT_API_KEY",
+    }
+
+    llm_model_instance = MoonshotChat(**llm_instance_config)
+    graph_config = {
+        "llm": {
+            "model_instance": llm_model_instance, 
+            "model_tokens": 5000
+        },
+    }
+
diff --git a/examples/anthropic/csv_scraper_haiku.py → examples/anthropic/csv_scraper_anthropic.py b/examples/anthropic/csv_scraper_haiku.py → examples/anthropic/csv_scraper_anthropic.py
@@ -32,7 +32,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...nthropic/csv_scraper_graph_multi_haiku.py → ...opic/csv_scraper_graph_multi_anthropic.py b/...nthropic/csv_scraper_graph_multi_haiku.py → ...opic/csv_scraper_graph_multi_anthropic.py
@@ -26,7 +26,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000},
 }
 

diff --git a/examples/anthropic/custom_graph_haiku.py → examples/anthropic/custom_graph_anthropic.py b/examples/anthropic/custom_graph_haiku.py → examples/anthropic/custom_graph_anthropic.py
@@ -18,7 +18,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/examples/anthropic/json_scraper_haiku.py → examples/anthropic/json_scraper_anthropic.py b/examples/anthropic/json_scraper_haiku.py → examples/anthropic/json_scraper_anthropic.py
@@ -26,7 +26,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...les/anthropic/json_scraper_multi_haiku.py → ...anthropic/json_scraper_multi_anthropic.py b/...les/anthropic/json_scraper_multi_haiku.py → ...anthropic/json_scraper_multi_anthropic.py
@@ -11,7 +11,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...ples/anthropic/pdf_scraper_graph_haiku.py → .../anthropic/pdf_scraper_graph_anthropic.py b/...ples/anthropic/pdf_scraper_graph_haiku.py → .../anthropic/pdf_scraper_graph_anthropic.py
@@ -14,7 +14,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...ples/anthropic/pdf_scraper_multi_haiku.py → .../anthropic/pdf_scraper_multi_anthropic.py b/...ples/anthropic/pdf_scraper_multi_haiku.py → .../anthropic/pdf_scraper_multi_anthropic.py
@@ -11,7 +11,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...ples/anthropic/scrape_plain_text_haiku.py → .../anthropic/scrape_plain_text_anthropic.py b/...ples/anthropic/scrape_plain_text_haiku.py → .../anthropic/scrape_plain_text_anthropic.py
@@ -28,7 +28,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/examples/anthropic/script_generator_haiku.py → ...s/anthropic/script_generator_anthropic.py b/examples/anthropic/script_generator_haiku.py → ...s/anthropic/script_generator_anthropic.py
@@ -16,7 +16,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...anthropic/script_multi_generator_haiku.py → ...ropic/script_multi_generator_anthropic.py b/...anthropic/script_multi_generator_haiku.py → ...ropic/script_multi_generator_anthropic.py
@@ -16,7 +16,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
         "library": "beautifulsoup"

diff --git a/examples/anthropic/search_graph_haiku.py → examples/anthropic/search_graph_anthropic.py b/examples/anthropic/search_graph_haiku.py → examples/anthropic/search_graph_anthropic.py
@@ -15,7 +15,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...es/anthropic/search_graph_schema_haiku.py → ...nthropic/search_graph_schema_anthropic.py b/...es/anthropic/search_graph_schema_haiku.py → ...nthropic/search_graph_schema_anthropic.py
@@ -27,8 +27,9 @@ class Dishes(BaseModel):
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
-        "max_tokens": 4000},
+        "model": "anthropic/claude-3-haiku-20240307",
+        "max_tokens": 4000
+        },
 }
 
 # ************************************************

diff --git a/...ples/anthropic/search_link_graph_haiku.py → .../anthropic/search_link_graph_anthropic.py b/...ples/anthropic/search_link_graph_haiku.py → .../anthropic/search_link_graph_anthropic.py
@@ -29,8 +29,11 @@
 # ************************************************
 
 graph_config = {
-    "llm": {"model_instance": llm_model_instance},
-    "embeddings": {"model_instance": embedder_model_instance}
+    "llm": {
+        "api_key": os.getenv("ANTHROPIC_API_KEY"),
+        "model": "anthropic/claude-3-haiku-20240307",
+        "max_tokens": 4000
+        },
 }
 
 # ************************************************

diff --git a/examples/anthropic/smart_scraper_haiku.py → ...ples/anthropic/smart_scraper_anthropic.py b/examples/anthropic/smart_scraper_haiku.py → ...ples/anthropic/smart_scraper_anthropic.py
@@ -19,7 +19,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...es/anthropic/smart_scraper_multi_haiku.py → ...nthropic/smart_scraper_multi_anthropic.py b/...es/anthropic/smart_scraper_multi_haiku.py → ...nthropic/smart_scraper_multi_anthropic.py
@@ -17,7 +17,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...s/anthropic/smart_scraper_schema_haiku.py → ...thropic/smart_scraper_schema_anthropic.py b/...s/anthropic/smart_scraper_schema_haiku.py → ...thropic/smart_scraper_schema_anthropic.py
@@ -33,7 +33,7 @@ class Projects(BaseModel):
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000},
 }
 

diff --git a/examples/anthropic/xml_scraper_haiku.py → examples/anthropic/xml_scraper_anthropic.py b/examples/anthropic/xml_scraper_haiku.py → examples/anthropic/xml_scraper_anthropic.py
@@ -26,7 +26,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000
         },
 }

diff --git a/...nthropic/xml_scraper_graph_multi_haiku.py → ...opic/xml_scraper_graph_multi_anthropic.py b/...nthropic/xml_scraper_graph_multi_haiku.py → ...opic/xml_scraper_graph_multi_anthropic.py
@@ -26,7 +26,7 @@
 graph_config = {
     "llm": {
         "api_key": os.getenv("ANTHROPIC_API_KEY"),
-        "model": "claude-3-haiku-20240307",
+        "model": "anthropic/claude-3-haiku-20240307",
         "max_tokens": 4000},
 }
 

diff --git a/examples/model_instance/.env.example b/examples/model_instance/.env.example
@@ -0,0 +1 @@
+MOONLIGHT_API_KEY="YOUR MOONLIGHT API KEY"
diff --git a/examples/model_instance/smart_scraper_with_model_instace.py b/examples/model_instance/smart_scraper_with_model_instace.py
@@ -0,0 +1,53 @@
+""" 
+Basic example of scraping pipeline using SmartScraper and model_instace
+"""
+
+import os, json
+from scrapegraphai.graphs import SmartScraperGraph
+from scrapegraphai.utils import prettify_exec_info
+from langchain_community.chat_models.moonshot import MoonshotChat
+from dotenv import load_dotenv
+load_dotenv()
+
+# ************************************************
+# Define the configuration for the graph
+# ************************************************
+
+
+llm_instance_config = {
+    "model": "moonshot-v1-8k",
+    "base_url": "https://api.moonshot.cn/v1",
+    "moonshot_api_key": os.getenv("MOONLIGHT_API_KEY"),
+}
+
+
+llm_model_instance = MoonshotChat(**llm_instance_config)
+
+graph_config = {
+    "llm": {
+        "model_instance": llm_model_instance, 
+        "model_tokens": 10000
+    },
+    "verbose": True,
+    "headless": True,
+}
+
+# ************************************************
+# Create the SmartScraperGraph instance and run it
+# ************************************************
+
+smart_scraper_graph = SmartScraperGraph(
+    prompt="List me what does the company do, the name and a contact email.",
+    source="https://scrapegraphai.com/",
+    config=graph_config
+)
+
+result = smart_scraper_graph.run()
+print(json.dumps(result, indent=4))
+
+# ************************************************
+# Get graph execution info
+# ************************************************
+
+graph_exec_info = smart_scraper_graph.get_execution_info()
+print(prettify_exec_info(graph_exec_info))
diff --git a/examples/moonshot/.env.example b/examples/moonshot/.env.example
@@ -0,0 +1 @@
+MOONLIGHT_API_KEY="YOUR MOONLIGHT API KEY"
diff --git a/examples/moonshot/readme.md b/examples/moonshot/readme.md
@@ -0,0 +1 @@
+This folder offer an example of how to use ScrapeGraph-AI with Moonshot and SmartScraperGraph. More usage examples can refer to openai exapmles.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This folder offer an example of how to use ScrapeGraph-AI with Moonshot and SmartScraperGraph. More usage examples can refer to openai exapmles.