-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix schema option not working #946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## [1.41.0](ScrapeGraphAI/Scrapegraph-ai@v1.40.1...v1.41.0) (2025-03-09) ### Features * add CLoD integration ([4e0e785](ScrapeGraphAI@4e0e785)) ### Test * Add coverage improvement test for tests/test_generate_answer_node.py ([6769c0d](ScrapeGraphAI@6769c0d)) * Add coverage improvement test for tests/test_models_tokens.py ([b21e781](ScrapeGraphAI@b21e781)) * Update coverage improvement test for tests/graphs/abstract_graph_test.py ([f296ac4](ScrapeGraphAI@f296ac4)) ### CI * **release:** 1.41.0-beta.1 [skip ci] ([7bfe494](ScrapeGraphAI@7bfe494))
## [1.42.0](ScrapeGraphAI/Scrapegraph-ai@v1.41.0...v1.42.0) (2025-03-10) ### Features * update terms ([ff7b33b](ScrapeGraphAI@ff7b33b))
## [1.42.1](ScrapeGraphAI/Scrapegraph-ai@v1.42.0...v1.42.1) (2025-03-12) ### Bug Fixes * add new gpt model ([cff799b](ScrapeGraphAI@cff799b))
## [1.43.0](ScrapeGraphAI/Scrapegraph-ai@v1.42.1...v1.43.0) (2025-03-13) ### Features * add intrgration for o3min ([fc0a148](ScrapeGraphAI@fc0a148))
I opened a Pull Request with the following: 🔄 4 test files added and 7 test files updated to reflect recent changes. 🔄 Test UpdatesI've added or updated 8 tests. They all pass ☑️
New Tests:
🐛 Bug DetectionPotential issues:
if search_engine == "duckduckgo":
research = DuckDuckGoSearchResults(max_results=max_results)
res = research.run(query)
results = re.findall(r"https?://[^\s,\]]+", res) The results = re.findall(r"https?://[^\s,\]]+", res)[:max_results] This change would ensure that no more than Test Error Logtests/utils/research_web_test.py::test_google_search: def test_google_search():
"""Tests search_on_web with Google search engine."""
> results = search_on_web("test query", search_engine="Google", max_results=2)
tests/utils/research_web_test.py:10:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
query = 'test query', search_engine = 'google', max_results = 2, port = 8080
timeout = 10, proxy = None, serper_api_key = None, region = None
language = 'en'
def search_on_web(
query: str,
search_engine: str = "duckduckgo",
max_results: int = 10,
port: int = 8080,
timeout: int = 10,
proxy: str | dict = None,
serper_api_key: str = None,
region: str = None,
language: str = "en",
) -> List[str]:
"""Search web function with improved error handling and validation
Args:
query (str): Search query
search_engine (str): Search engine to use
max_results (int): Maximum number of results to return
port (int): Port for SearXNG
timeout (int): Request timeout in seconds
proxy (str | dict): Proxy configuration
serper_api_key (str): API key for Serper
region (str): Country/region code (e.g., 'mx' for Mexico)
language (str): Language code (e.g., 'es' for Spanish)
"""
# Input validation
if not query or not isinstance(query, str):
raise ValueError("Query must be a non-empty string")
search_engine = search_engine.lower()
valid_engines = {"duckduckgo", "bing", "searxng", "serper"}
if search_engine not in valid_engines:
> raise ValueError(f"Search engine must be one of: {', '.join(valid_engines)}")
E ValueError: Search engine must be one of: searxng, duckduckgo, serper, bing
scrapegraphai/utils/research_web.py:45: ValueError
☂️ Coverage ImprovementsCoverage improvements by file:
🎨 Final Touches
Settings | Logs | CodeBeaver |
HI @payala, could you please add a screenshot of results? |
You mean this @VinciGit00 ? |
Yes thx |
🎉 This PR is included in version 1.43.1-beta.1 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
🎉 This PR is included in version 1.43.1 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Adding a pydantic schema to SmartScrapeGraph was not working because the format instructions were being appended to the prompt and that was breaking the prompt template variable parsing.
This "IMPORTANT: " appended text is removed, since the
format_instructions
are anyway added to the prompt being passed as variables, and this is what is breaking the prompt when a schema is passed.This is my first contribution to this project, I tried to follow all the guidelines, let me know if there is something I should do differently please.