Skip to content

Commit 765b548

Browse files
authored
Merge pull request #417 from ScrapeGraphAI/md_scraper_integration
feat: add integrations for markdown files
2 parents 8a52914 + f3b6343 commit 765b548

File tree

3 files changed

+6
-2
lines changed

3 files changed

+6
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
1313
[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)
1414

15-
ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.).
15+
ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
1616

1717
Just say which information you want to extract and the library will do it for you!
1818

scrapegraphai/graphs/markdown_scraper_graph.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ def _create_graph(self) -> BaseGraph:
7878
"llm_model": self.llm_model,
7979
"additional_info": self.config.get("additional_info"),
8080
"schema": self.schema,
81+
"is_md_scraper": True
8182
}
8283
)
8384

scrapegraphai/nodes/generate_answer_node.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ def __init__(
5353
self.script_creator = (
5454
False if node_config is None else node_config.get("script_creator", False)
5555
)
56+
self.is_md_scraper = (
57+
False if node_config is None else node_config.get("is_md_scraper", False)
58+
)
5659

5760
self.additional_info = node_config.get("additional_info")
5861

@@ -90,7 +93,7 @@ def execute(self, state: dict) -> dict:
9093

9194
format_instructions = output_parser.get_format_instructions()
9295

93-
if isinstance(self.llm_model, OpenAI) and not self.script_creator or self.force and not self.script_creator:
96+
if isinstance(self.llm_model, OpenAI) and not self.script_creator or self.force and not self.script_creator or self.is_md_scraper:
9497
template_no_chunks_prompt = template_no_chunks_md
9598
template_chunks_prompt = template_chunks_md
9699
template_merge_prompt = template_merge_md

0 commit comments

Comments
 (0)