Skip to content

Pre/beta #418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
## [1.7.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.7.4...v1.7.5) (2024-06-28)
## [1.8.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.7.4...v1.8.0-beta.1) (2024-06-25)


### Features

* add new search engine avaiability and new tests ([073d226](https://github.com/VinciGit00/Scrapegraph-ai/commit/073d226723f5f03b960865d07408905b7a506180))
* add research with bing + test function ([aa2160c](https://github.com/VinciGit00/Scrapegraph-ai/commit/aa2160c108764745a696ffc16038f370e9702c14))



### Bug Fixes

* add new claude model ([4d93641](https://github.com/VinciGit00/Scrapegraph-ai/commit/4d936410ccaa3a4b810065e0e84b49b15c09fb28))
* updated for schema changes ([aedda44](https://github.com/VinciGit00/Scrapegraph-ai/commit/aedda448682ce5a921a62e661bffb02478bab75f))


### CI

* **release:** 1.7.0-beta.13 [skip ci] ([ce0a47a](https://github.com/VinciGit00/Scrapegraph-ai/commit/ce0a47aee5edbb26fd82e41f6688a4bc48a10822))
* **release:** 1.7.0-beta.14 [skip ci] ([ec77ff7](https://github.com/VinciGit00/Scrapegraph-ai/commit/ec77ff7ea4eb071469c2fb53e5959d4ea1f73ad6))


## [1.7.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.7.3...v1.7.4) (2024-06-21)

Expand Down Expand Up @@ -46,6 +60,7 @@
## [1.7.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.6.1...v1.7.0) (2024-06-17)



### Features

* add caching ([d790361](https://github.com/VinciGit00/Scrapegraph-ai/commit/d79036149a3197a385b73553f29df66d36480c38))
Expand Down Expand Up @@ -143,6 +158,7 @@
* **release:** 1.7.0-beta.8 [skip ci] ([a87702f](https://github.com/VinciGit00/Scrapegraph-ai/commit/a87702f107f3fd16ee73e1af1585cd763788bf46))
* **release:** 1.7.0-beta.9 [skip ci] ([0c5d6e2](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c5d6e2c82b9ee81c91cd2325948bb5a4eddcb31))


## [1.7.0-beta.12](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.7.0-beta.11...v1.7.0-beta.12) (2024-06-17)


Expand Down
39 changes: 19 additions & 20 deletions examples/ernie/smart_scraper_schema_ernie.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,31 @@
Basic example of scraping pipeline using SmartScraper with schema
"""

import os, json
import json
import os
from typing import Dict

from dotenv import load_dotenv
from pydantic import BaseModel

from scrapegraphai.graphs import SmartScraperGraph


load_dotenv()

# ************************************************
# Define the output schema for the graph
# ************************************************

schema= """
{
"Projects": [
"Project #":
{
"title": "...",
"description": "...",
},
"Project #":
{
"title": "...",
"description": "...",
}
]
}
"""

class Project(BaseModel):
title: str
description: str


class Projects(BaseModel):
Projects: Dict[str, Project]


# ************************************************
# Define the configuration for the graph
Expand All @@ -37,7 +36,7 @@

graph_config = {
"llm": {
"api_key":openai_key,
"api_key": openai_key,
"model": "gpt-3.5-turbo",
},
"verbose": True,
Expand All @@ -51,8 +50,8 @@
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description",
source="https://perinim.github.io/projects/",
schema=schema,
config=graph_config
schema=Projects,
config=graph_config,
)

result = smart_scraper_graph.run()
Expand Down
27 changes: 10 additions & 17 deletions examples/huggingfacehub/smart_scraper_schema_huggingfacehub.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@

import os
from dotenv import load_dotenv
from typing import Dict

from pydantic import BaseModel
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
from langchain_community.llms import HuggingFaceEndpoint
Expand All @@ -13,22 +16,12 @@
# Define the output schema for the graph
# ************************************************

schema= """
{
"Projects": [
"Project #":
{
"title": "...",
"description": "...",
},
"Project #":
{
"title": "...",
"description": "...",
}
]
}
"""
class Project(BaseModel):
title: str
description: str

class Projects(BaseModel):
Projects: Dict[str, Project]

## required environment variable in .env
#HUGGINGFACEHUB_API_TOKEN
Expand Down Expand Up @@ -61,7 +54,7 @@
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description",
source="https://perinim.github.io/projects/",
schema=schema,
schema=Projects,
config=graph_config
)
result = smart_scraper_graph.run()
Expand Down
31 changes: 13 additions & 18 deletions examples/mixed_models/smart_scraper_schema_groq_openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,13 @@
Basic example of scraping pipeline using SmartScraper with schema
"""

import os, json
import json
import os
from typing import Dict, List

from dotenv import load_dotenv
from pydantic import BaseModel

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

Expand All @@ -13,22 +18,12 @@
# Define the output schema for the graph
# ************************************************

schema= """
{
"Projects": [
"Project #":
{
"title": "...",
"description": "...",
},
"Project #":
{
"title": "...",
"description": "...",
}
]
}
"""
class Project(BaseModel):
title: str
description: str

class Projects(BaseModel):
Projects: Dict[str, Project]

# ************************************************
# Define the configuration for the graph
Expand Down Expand Up @@ -60,7 +55,7 @@
prompt="List me all the projects with their description.",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects/",
schema=schema,
schema=Projects,
config=graph_config
)

Expand Down
50 changes: 50 additions & 0 deletions examples/single_node/search_internet_node.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
"""
Example of custom graph using existing nodes
"""

from scrapegraphai.models import Ollama
from scrapegraphai.nodes import SearchInternetNode

# ************************************************
# Define the configuration for the graph
# ************************************************

graph_config = {
"llm": {
"model": "llama3",
"temperature": 0,
"streaming": True
},
"search_engine": "google",
"max_results": 3,
"verbose": True
}

# ************************************************
# Define the node
# ************************************************

llm_model = Ollama(graph_config["llm"])

search_node = SearchInternetNode(
input="user_input",
output=["search_results"],
node_config={
"llm_model": llm_model,
"search_engine": graph_config["search_engine"],
"max_results": graph_config["max_results"],
"verbose": graph_config["verbose"]
}
)

# ************************************************
# Test the node
# ************************************************

state = {
"user_input": "What is the capital of France?"
}

result = search_node.execute(state)

print(result)
3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
name = "scrapegraphai"


version = "1.7.5"

version = "1.8.0b1"


description = "A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines."
Expand Down
6 changes: 3 additions & 3 deletions scrapegraphai/builders/graph_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ class GraphBuilder:
ValueError: If 'api_key' is not included in llm_config.
"""

def __init__(self, user_prompt: str, config: dict):
def __init__(self, prompt: str, config: dict):
"""
Initializes the GraphBuilder with a user prompt and language model configuration.
"""
self.user_prompt = user_prompt
self.prompt = prompt
self.config = config
self.llm = self._create_llm(config["llm"])
self.nodes_description = self._generate_nodes_description()
Expand Down Expand Up @@ -122,7 +122,7 @@ def build_graph(self):
Returns:
dict: A JSON representation of the graph configuration.
"""
return self.chain.invoke(self.user_prompt)
return self.chain.invoke(self.prompt)

@staticmethod
def convert_json_to_graphviz(json_data, format: str = 'pdf'):
Expand Down
2 changes: 1 addition & 1 deletion scrapegraphai/graphs/abstract_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class AbstractGraph(ABC):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.
llm_model: An instance of a language model client, configured for generating answers.
embedder_model: An instance of an embedding model client,
configured for generating embeddings.
Expand Down
6 changes: 4 additions & 2 deletions scrapegraphai/graphs/csv_scraper_multi_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
from copy import copy, deepcopy
from typing import List, Optional

from pydantic import BaseModel

from .base_graph import BaseGraph
from .abstract_graph import AbstractGraph
from .csv_scraper_graph import CSVScraperGraph
Expand Down Expand Up @@ -32,7 +34,7 @@ class CSVScraperMultiGraph(AbstractGraph):
prompt (str): The user prompt to search the internet.
source (List[str]): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (Optional[str]): The schema for the graph output.
schema (Optional[BaseModel]): The schema for the graph output.

Example:
>>> search_graph = MultipleSearchGraph(
Expand All @@ -42,7 +44,7 @@ class CSVScraperMultiGraph(AbstractGraph):
>>> result = search_graph.run()
"""

def __init__(self, prompt: str, source: List[str], config: dict, schema: Optional[str] = None):
def __init__(self, prompt: str, source: List[str], config: dict, schema: Optional[BaseModel] = None):

self.max_results = config.get("max_results", 3)

Expand Down
4 changes: 2 additions & 2 deletions scrapegraphai/graphs/deep_scraper_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class DeepScraperGraph(AbstractGraph):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.
llm_model: An instance of a language model client, configured for generating answers.
embedder_model: An instance of an embedding model client,
configured for generating embeddings.
Expand All @@ -45,7 +45,7 @@ class DeepScraperGraph(AbstractGraph):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.

Example:
>>> deep_scraper = DeepScraperGraph(
Expand Down
4 changes: 2 additions & 2 deletions scrapegraphai/graphs/json_scraper_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ class JSONScraperGraph(AbstractGraph):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.
llm_model: An instance of a language model client, configured for generating answers.
embedder_model: An instance of an embedding model client,
configured for generating embeddings.
Expand All @@ -34,7 +34,7 @@ class JSONScraperGraph(AbstractGraph):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.

Example:
>>> json_scraper = JSONScraperGraph(
Expand Down
2 changes: 1 addition & 1 deletion scrapegraphai/graphs/json_scraper_multi_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ class JSONScraperMultiGraph(AbstractGraph):
prompt (str): The user prompt to search the internet.
source (List[str]): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (Optional[str]): The schema for the graph output.
schema (Optional[BaseModel]): The schema for the graph output.

Example:
>>> search_graph = MultipleSearchGraph(
Expand Down
4 changes: 2 additions & 2 deletions scrapegraphai/graphs/omni_scraper_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class OmniScraperGraph(AbstractGraph):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.
llm_model: An instance of a language model client, configured for generating answers.
embedder_model: An instance of an embedding model client,
configured for generating embeddings.
Expand All @@ -41,7 +41,7 @@ class OmniScraperGraph(AbstractGraph):
prompt (str): The prompt for the graph.
source (str): The source of the graph.
config (dict): Configuration parameters for the graph.
schema (str): The schema for the graph output.
schema (BaseModel): The schema for the graph output.

Example:
>>> omni_scraper = OmniScraperGraph(
Expand Down
Loading
Loading