Skip to content

Commit 5c08eea

Browse files
committed
docs: prev version
1 parent e08b304 commit 5c08eea

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1919
-221
lines changed

CHANGELOG.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
<<<<<<< HEAD
21
## [1.11.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.11.1...v1.11.2) (2024-07-23)
32

43

@@ -50,13 +49,31 @@
5049

5150
## [1.10.0-beta.8](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.7...v1.10.0-beta.8) (2024-07-23)
5251

53-
=======
54-
>>>>>>> parent of 7708828 (Merge pull request #488 from ScrapeGraphAI/pre/beta)
5552
## [1.10.4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.3...v1.10.4) (2024-07-22)
5653

5754

55+
5856
### Bug Fixes
5957

58+
59+
* **md_conversion:** add absolute links md, added missing dependency ([12b5ead](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/12b5eada6ea783770afd630ede69b8cf867a7ded))
60+
61+
## [1.10.0-beta.7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.6...v1.10.0-beta.7) (2024-07-23)
62+
63+
64+
### Features
65+
66+
* add nvidia connection ([fc0dadb](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fc0dadb8f812dfd636dec856921a971b58695ce3))
67+
68+
69+
### chore
70+
71+
* **dependecies:** add script to auto-update requirements ([3289c7b](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3289c7bf5ec58ac3d04e9e5e8e654af9abcee228))
72+
* **ci:** set up workflow for requirements auto-update ([295fc28](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/295fc28ceb02c78198f7fbe678352503b3259b6b))
73+
* update requirements.txt ([c7bac98](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c7bac98d2e79e5ab98fa65d7efa858a2cdda1622))
74+
75+
## [1.10.0-beta.6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.5...v1.10.0-beta.6) (2024-07-22)
76+
6077
* parse node ([09256f7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/09256f7b11a7a1c2aba01cf8de70401af1e86fe4))
6178

6279
## [1.10.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.2...v1.10.3) (2024-07-22)
@@ -83,8 +100,12 @@
83100
## [1.10.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.9.2...v1.10.0) (2024-07-20)
84101

85102

103+
86104
### Features
87105

106+
107+
* add new toml ([fcb3220](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fcb3220868e7ef1127a7a47f40d0379be282e6eb))
108+
88109
* add gpt4o omni ([431edb7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/431edb7bb2504f4c1335c3ae3ce2f91867fa7222))
89110
* add searchngx integration ([5c92186](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/5c9218608140bf694fbfd96aa90276bc438bb475))
90111
* refactoring_to_md function ([602dd00](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/602dd00209ee1d72a1223fc4793759450921fcf9))
@@ -97,8 +118,11 @@
97118
* search link node ([cf3ab55](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cf3ab5564ae5c415c63d1771b32ea68f5169ca82))
98119

99120

121+
100122
### chore
101123

124+
125+
* **pyproject:** upgrade dependencies ([0425124](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0425124c570f765b98fcf67ba6649f4f9fe76b15))
102126
* correct search engine name ([7ba2f6a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/7ba2f6ae0b9d2e9336e973e1f57ab8355c739e27))
103127
* remove unused import ([fd1b7cb](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fd1b7cb24a7c252277607abde35826e3c58e34ef))
104128
* **ci:** upgrade lockfiles ([c7b05a4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c7b05a4993df14d6ed4848121a3cd209571232f7))
@@ -122,6 +146,7 @@
122146
* **release:** 1.9.0-beta.5 [skip ci] ([bb62439](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bb624399cfc3924825892dd48697fc298ad3b002))
123147
* **release:** 1.9.0-beta.6 [skip ci] ([54a69de](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/54a69de69e8077e02fd5584783ca62cc2e0ec5bb))
124148

149+
125150
## [1.10.0-beta.5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.4...v1.10.0-beta.5) (2024-07-20)
126151

127152

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""
2+
Basic example of scraping pipeline using CSVScraperMultiGraph from CSV documents
3+
"""
4+
5+
import os
6+
import pandas as pd
7+
from dotenv import load_dotenv
8+
from scrapegraphai.graphs import CSVScraperMultiGraph
9+
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
10+
11+
load_dotenv()
12+
# ************************************************
13+
# Read the CSV file
14+
# ************************************************
15+
16+
FILE_NAME = "inputs/username.csv"
17+
curr_dir = os.path.dirname(os.path.realpath(__file__))
18+
file_path = os.path.join(curr_dir, FILE_NAME)
19+
20+
text = pd.read_csv(file_path)
21+
22+
# ************************************************
23+
# Define the configuration for the graph
24+
# ************************************************
25+
26+
graph_config = {
27+
"llm": {
28+
"api_key": os.getenv("NEMOTRON_APIKEY"),
29+
"model": "nvidia/meta/llama3-70b-instruct",
30+
}
31+
}
32+
33+
# ************************************************
34+
# Create the CSVScraperMultiGraph instance and run it
35+
# ************************************************
36+
37+
csv_scraper_graph = CSVScraperMultiGraph(
38+
prompt="List me all the last names",
39+
source=[str(text), str(text)],
40+
config=graph_config
41+
)
42+
43+
result = csv_scraper_graph.run()
44+
print(result)
45+
46+
# ************************************************
47+
# Get graph execution info
48+
# ************************************************
49+
50+
graph_exec_info = csv_scraper_graph.get_execution_info()
51+
print(prettify_exec_info(graph_exec_info))
52+
53+
# Save to json or csv
54+
convert_to_csv(result, "result")
55+
convert_to_json(result, "result")
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
"""
2+
Basic example of scraping pipeline using CSVScraperGraph from CSV documents
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
import pandas as pd
8+
from scrapegraphai.graphs import CSVScraperGraph
9+
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
10+
load_dotenv()
11+
12+
# ************************************************
13+
# Read the CSV file
14+
# ************************************************
15+
16+
FILE_NAME = "inputs/username.csv"
17+
curr_dir = os.path.dirname(os.path.realpath(__file__))
18+
file_path = os.path.join(curr_dir, FILE_NAME)
19+
20+
text = pd.read_csv(file_path)
21+
22+
# ************************************************
23+
# Define the configuration for the graph
24+
# ************************************************
25+
26+
nemotron_key = os.getenv("NEMOTRON_APIKEY")
27+
28+
graph_config = {
29+
"llm": {
30+
"api_key": nemotron_key,
31+
"model": "nvidia/meta/llama3-70b-instruct",
32+
},
33+
}
34+
35+
# ************************************************
36+
# Create the CSVScraperGraph instance and run it
37+
# ************************************************
38+
39+
csv_scraper_graph = CSVScraperGraph(
40+
prompt="List me all the last names",
41+
source=str(text), # Pass the content of the file, not the file object
42+
config=graph_config
43+
)
44+
45+
result = csv_scraper_graph.run()
46+
print(result)
47+
48+
# ************************************************
49+
# Get graph execution info
50+
# ************************************************
51+
52+
graph_exec_info = csv_scraper_graph.get_execution_info()
53+
print(prettify_exec_info(graph_exec_info))
54+
55+
# Save to json or csv
56+
convert_to_csv(result, "result")
57+
convert_to_json(result, "result")
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
"""
2+
Example of custom graph using existing nodes
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
8+
from langchain_openai import OpenAIEmbeddings
9+
from scrapegraphai.models import OpenAI
10+
from scrapegraphai.graphs import BaseGraph
11+
from scrapegraphai.nodes import FetchNode, ParseNode, RAGNode, GenerateAnswerNode, RobotsNode
12+
load_dotenv()
13+
14+
# ************************************************
15+
# Define the configuration for the graph
16+
# ************************************************
17+
18+
graph_config = {
19+
"llm": {
20+
"api_key": os.getenv("NEMOTRON_KEY"),
21+
"model": "claude-3-haiku-20240307",
22+
},
23+
}
24+
25+
# ************************************************
26+
# Define the graph nodes
27+
# ************************************************
28+
29+
llm_model = OpenAI(graph_config["llm"])
30+
embedder = OpenAIEmbeddings(api_key=llm_model.openai_api_key)
31+
32+
# define the nodes for the graph
33+
robot_node = RobotsNode(
34+
input="url",
35+
output=["is_scrapable"],
36+
node_config={
37+
"llm_model": llm_model,
38+
"force_scraping": True,
39+
"verbose": True,
40+
}
41+
)
42+
43+
fetch_node = FetchNode(
44+
input="url | local_dir",
45+
output=["doc", "link_urls", "img_urls"],
46+
node_config={
47+
"verbose": True,
48+
"headless": True,
49+
}
50+
)
51+
parse_node = ParseNode(
52+
input="doc",
53+
output=["parsed_doc"],
54+
node_config={
55+
"chunk_size": 4096,
56+
"verbose": True,
57+
}
58+
)
59+
rag_node = RAGNode(
60+
input="user_prompt & (parsed_doc | doc)",
61+
output=["relevant_chunks"],
62+
node_config={
63+
"llm_model": llm_model,
64+
"embedder_model": embedder,
65+
"verbose": True,
66+
}
67+
)
68+
generate_answer_node = GenerateAnswerNode(
69+
input="user_prompt & (relevant_chunks | parsed_doc | doc)",
70+
output=["answer"],
71+
node_config={
72+
"llm_model": llm_model,
73+
"verbose": True,
74+
}
75+
)
76+
77+
# ************************************************
78+
# Create the graph by defining the connections
79+
# ************************************************
80+
81+
graph = BaseGraph(
82+
nodes=[
83+
robot_node,
84+
fetch_node,
85+
parse_node,
86+
rag_node,
87+
generate_answer_node,
88+
],
89+
edges=[
90+
(robot_node, fetch_node),
91+
(fetch_node, parse_node),
92+
(parse_node, rag_node),
93+
(rag_node, generate_answer_node)
94+
],
95+
entry_point=robot_node
96+
)
97+
98+
# ************************************************
99+
# Execute the graph
100+
# ************************************************
101+
102+
result, execution_info = graph.execute({
103+
"user_prompt": "Describe the content",
104+
"url": "https://example.com/"
105+
})
106+
107+
# get the answer from the result
108+
result = result.get("answer", "No answer found.")
109+
print(result)
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from scrapegraphai.graphs import DeepScraperGraph
8+
from scrapegraphai.utils import prettify_exec_info
9+
10+
load_dotenv()
11+
12+
# ************************************************
13+
# Define the configuration for the graph
14+
# ************************************************
15+
16+
nemotron_key = os.getenv("NEMOTRON_APIKEY")
17+
18+
graph_config = {
19+
"llm": {
20+
"api_key": nemotron_key,
21+
"model": "nvidia/meta/llama3-70b-instruct",
22+
},
23+
"verbose": True,
24+
"max_depth": 1
25+
}
26+
27+
# ************************************************
28+
# Create the SmartScraperGraph instance and run it
29+
# ************************************************
30+
31+
deep_scraper_graph = DeepScraperGraph(
32+
prompt="List me all the job titles and detailed job description.",
33+
# also accepts a string with the already downloaded HTML code
34+
source="https://www.google.com/about/careers/applications/jobs/results/?location=Bangalore%20India",
35+
config=graph_config
36+
)
37+
38+
result = deep_scraper_graph.run()
39+
print(result)
40+
41+
# ************************************************
42+
# Get graph execution info
43+
# ************************************************
44+
45+
graph_exec_info = deep_scraper_graph.get_execution_info()
46+
print(deep_scraper_graph.get_state("relevant_links"))
47+
print(prettify_exec_info(graph_exec_info))

0 commit comments

Comments
 (0)