Skip to content

reallignment #247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
134c94e
added first asdt implementation
PeriniM Apr 17, 2024
ae69e4b
dom tree structure
PeriniM Apr 24, 2024
f232717
REFACTORING
VinciGit00 Apr 25, 2024
c927f70
subtrees implementation
PeriniM Apr 25, 2024
c5f9fca
Merge branch 'asdt' of https://github.com/VinciGit00/Scrapegraph-ai i…
PeriniM Apr 25, 2024
dd99ac5
structural and textual hashing
PeriniM Apr 25, 2024
e778d27
added tree metadata
PeriniM Apr 26, 2024
6e7283e
feat: add finalize_node()
PeriniM Apr 29, 2024
e1b9d69
dev basic class blockindentifier
lurenss May 5, 2024
edfaf3f
accessibility tree
PeriniM May 6, 2024
51aa109
feat: add turboscraper (alfa)
VinciGit00 May 6, 2024
67d5fbf
feat: new search_graph
VinciGit00 May 6, 2024
b326886
Merge branch '88-blockscraper-implementation' into asdt
VinciGit00 May 7, 2024
3ee6f58
Merge pull request #168 from VinciGit00/asdt
VinciGit00 May 7, 2024
90955ca
feat(gpt-4o): image to text single node test
PeriniM May 14, 2024
a296927
feat(omni-scraper): working OmniScraperGraph with images
PeriniM May 14, 2024
fcb3abb
feat(omni-search): added omni search graph and updated docs
PeriniM May 14, 2024
a6e1813
fix(fetch_node): bug in handling local files
PeriniM May 14, 2024
a458ec4
Update the prompt for the search_link_node
mayurdb May 14, 2024
d76badd
Merge pull request #239 from mayurdb/deepScrapeFix
VinciGit00 May 14, 2024
932df8d
Merge pull request #238 from VinciGit00/gpt4-omni
VinciGit00 May 14, 2024
8727d03
ci(release): 0.11.0-beta.11 [skip ci]
semantic-release-bot May 14, 2024
2a57940
Merge pull request #234 from VinciGit00/pre/beta
VinciGit00 May 14, 2024
c55a3b1
ci(release): 0.11.0 [skip ci]
semantic-release-bot May 14, 2024
b0a67ba
fix(docs): requirements-dev
PeriniM May 14, 2024
6effe25
ci(release): 0.11.1 [skip ci]
semantic-release-bot May 14, 2024
78d1940
docs(main-readme): fixed some typos
PeriniM May 15, 2024
8fc2510
chore(package manager)!: move from poetry to rye
f-aguzzi May 15, 2024
672bd29
Merge pull request #244 from f-aguzzi/main
VinciGit00 May 15, 2024
c0b6f02
ci(release): 1.0.0 [skip ci]
semantic-release-bot May 15, 2024
24d56af
Update pyproject.toml
VinciGit00 May 15, 2024
096b665
fix(searchgraph): used shallow copy to serialize obj
PeriniM May 15, 2024
a81d2b7
ci(release): 1.0.1 [skip ci]
semantic-release-bot May 15, 2024
7ccd51a
add rye update script bash
VinciGit00 May 15, 2024
694d3ab
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 15, 2024
efb781f
docs(rye): replaced poetry with rye
PeriniM May 15, 2024
22cd9e3
Merge branch 'search_link_context' into main
VinciGit00 May 15, 2024
51200d6
ci(release): 1.1.0 [skip ci]
semantic-release-bot May 15, 2024
cffcf80
Merge branch '88-blockscraper-implementation' into main
VinciGit00 May 15, 2024
d5045ab
ci(release): 1.2.0 [skip ci]
semantic-release-bot May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,8 @@ jobs:
run: |
sudo apt update
sudo apt install -y git
- name: Install Python Env and Poetry
uses: actions/setup-python@v5
with:
python-version: '3.9'
- run: pip install poetry
- name: Install the latest version of rye
uses: eifinger/setup-rye@v3
- name: Install Node Env
uses: actions/setup-node@v4
with:
Expand All @@ -30,8 +27,8 @@ jobs:
persist-credentials: false
- name: Build app
run: |
poetry install
poetry build
rye sync --no-lock
rye build
id: build_cache
if: success()
- name: Cache build
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,6 @@ examples/graph_examples/ScrapeGraphAI_generated_graph
examples/**/result.csv
examples/**/result.json
main.py
*.python-version
*.lock

1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.9.19
134 changes: 134 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,137 @@
## [1.2.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.1.0...v1.2.0) (2024-05-15)


### Features

* add finalize_node() ([6e7283e](https://github.com/VinciGit00/Scrapegraph-ai/commit/6e7283ed8fc42408d718e8776f9fd3856960ffdb))

## [1.1.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.0.1...v1.1.0) (2024-05-15)


### Features

* add turboscraper (alfa) ([51aa109](https://github.com/VinciGit00/Scrapegraph-ai/commit/51aa109e420a71101664906f0849f39ea2a3f91a))
* new search_graph ([67d5fbf](https://github.com/VinciGit00/Scrapegraph-ai/commit/67d5fbf816275940c89802e033b9e7796436c410))


### Docs

* **rye:** replaced poetry with rye ([efb781f](https://github.com/VinciGit00/Scrapegraph-ai/commit/efb781f950b23f442706d54a578230aba9e9796a))

## [1.0.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v1.0.0...v1.0.1) (2024-05-15)


### Bug Fixes

* **searchgraph:** used shallow copy to serialize obj ([096b665](https://github.com/VinciGit00/Scrapegraph-ai/commit/096b665c0152593c19402e555c0850cdd3b2a2c0))

## [1.0.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.1...v1.0.0) (2024-05-15)


### ⚠ BREAKING CHANGES

* **package manager:** move from poetry to rye

### chore

* **package manager:** move from poetry to rye ([8fc2510](https://github.com/VinciGit00/Scrapegraph-ai/commit/8fc2510b3704990ff96f5f74abb5b800bca9af98)), closes [#198](https://github.com/VinciGit00/Scrapegraph-ai/issues/198)


### Docs

* **main-readme:** fixed some typos ([78d1940](https://github.com/VinciGit00/Scrapegraph-ai/commit/78d19402351f18b3ed3a9d7e4200ad22ad0d064a))

## [0.11.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0...v0.11.1) (2024-05-14)


### Bug Fixes

* **docs:** requirements-dev ([b0a67ba](https://github.com/VinciGit00/Scrapegraph-ai/commit/b0a67ba387e7d3a3dca7b82fe3e5b39c6a34c3ba))

## [0.11.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.1...v0.11.0) (2024-05-14)


### Features

* **parallel-exeuction:** add asyncio event loop dispatcher with semaphore for parallel graph instances ([627cbee](https://github.com/VinciGit00/Scrapegraph-ai/commit/627cbeeb2096eb4cd5da45015d37fceb7fe7840a))
* **webdriver-backend:** add dynamic import scripts from module and file ([db2234b](https://github.com/VinciGit00/Scrapegraph-ai/commit/db2234bf5d2f2589b080cd4136f33c4f4443bdfb))
* add gpt-4o ([52a4a3b](https://github.com/VinciGit00/Scrapegraph-ai/commit/52a4a3b22d6871b14801a5edbd28aa32a1a2580d)), closes [#232](https://github.com/VinciGit00/Scrapegraph-ai/issues/232)
* add new prompt info ([e2350ed](https://github.com/VinciGit00/Scrapegraph-ai/commit/e2350eda6249d8e121344d12c92645a3887a5b76))
* **proxy-rotation:** add parse (IP address) or search (from broker) functionality for proxy rotation ([2170131](https://github.com/VinciGit00/Scrapegraph-ai/commit/217013181da06abe8d71d9db70e809ea4ebd8236))
* add support for deepseek-chat ([156b67b](https://github.com/VinciGit00/Scrapegraph-ai/commit/156b67b91e1798f67082123e2c0087d358a32d4d)), closes [#222](https://github.com/VinciGit00/Scrapegraph-ai/issues/222)
* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))
* **omni-search:** added omni search graph and updated docs ([fcb3abb](https://github.com/VinciGit00/Scrapegraph-ai/commit/fcb3abb01d505f634309f9ae3c686bbcaab65107))
* added proxy rotation ([0c36a7e](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c36a7ec1f32ee073d9e0f534a2cb97aba3d7a1f))
* **safe-web-driver:** enchanced the original `AsyncChromiumLoader` web driver with proxy protection and flexible kwargs and backend ([768719c](https://github.com/VinciGit00/Scrapegraph-ai/commit/768719cce80953fa6cbe283e442420116c438f16))
* **gpt-4o:** image to text single node test ([90955ca](https://github.com/VinciGit00/Scrapegraph-ai/commit/90955ca52f1e3277072e843fb8d578deea27d09f))
* revert fetch_node ([864aa91](https://github.com/VinciGit00/Scrapegraph-ai/commit/864aa91326c360992326e04811d272e55eac8355))
* **batchsize:** tested different batch sizes and systems ([a8d5e7d](https://github.com/VinciGit00/Scrapegraph-ai/commit/a8d5e7db050e15306780ffca47f998ebaf5c1216))
* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))
* **omni-scraper:** working OmniScraperGraph with images ([a296927](https://github.com/VinciGit00/Scrapegraph-ai/commit/a2969276245cbedb97741975ea707dab2695f71e))


### Bug Fixes

* **pytest:** add dependency for mocking testing functions ([2f4fd45](https://github.com/VinciGit00/Scrapegraph-ai/commit/2f4fd45700ebf1db0c429b5a6249386d1a111615))
* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))
* Augment the information getting fetched from a webpage ([f8ce3d5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8ce3d5916eab926275d59d4d48b0d89ec9cd43f))
* bug for claude ([d0167de](https://github.com/VinciGit00/Scrapegraph-ai/commit/d0167dee71779a3c1e1e042e17a41134b93b3c78))
* **fetch_node:** bug in handling local files ([a6e1813](https://github.com/VinciGit00/Scrapegraph-ai/commit/a6e1813ddd36cc8d7c915e6ea0525835d64d10a2))
* **chromium-loader:** ensure it subclasses langchain's base loader ([b54d984](https://github.com/VinciGit00/Scrapegraph-ai/commit/b54d984c134c8cbc432fd111bb161d3d53cf4a85))
* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))
* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))
* **proxy-rotation:** removed duplicated arg and passed the loader_kwarhs correctly to the node ([1e9a564](https://github.com/VinciGit00/Scrapegraph-ai/commit/1e9a56461632999c5dc09f5aa930c14c954025ad))
* **fetch-node:** removed isSoup from default ([0c15947](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c1594737f878ed5672f4c889fdf9b4e0d7ec49a))
* **proxy-rotation:** removed max_shape duplicate ([5d6d996](https://github.com/VinciGit00/Scrapegraph-ai/commit/5d6d996e8f6132101d4c3af835d74f0674baffa1))
* **asyncio:** replaced deepcopy with copy due to serialization problems ([dedc733](https://github.com/VinciGit00/Scrapegraph-ai/commit/dedc73304755c2d540a121d143173f60fb448bbb))


### chore

* update models_tokens.py with new model configurations ([d9752b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/d9752b1619c6f86fdc407c898c8c9b443a50cb07))


### Docs

* add diagram showing general structure/flow of the library ([13ae918](https://github.com/VinciGit00/Scrapegraph-ai/commit/13ae9180ac5e7ef11dad1a210cf8790e797397dd))
* **refactor:** added proxy-rotation usage and refactor readthedocs ([e256b75](https://github.com/VinciGit00/Scrapegraph-ai/commit/e256b758b2ada641f97b23b1cf6c6b0174563d8a))
* **refactor:** changed example ([c7ec114](https://github.com/VinciGit00/Scrapegraph-ai/commit/c7ec114274da64f0b61cee80afe908a36ad26b78))
* **concurrent:** refactor theme and added benchmarck searchgraph ([ced2bbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/ced2bbcdc9672396e3c8afdc1f7f65c4194d29fd))
* update overview diagram with more models ([b441b30](https://github.com/VinciGit00/Scrapegraph-ai/commit/b441b30a5c60dda105964f69bd4cef06825f5c74))


### CI

* **release:** 0.10.0-beta.3 [skip ci] ([ad32298](https://github.com/VinciGit00/Scrapegraph-ai/commit/ad32298e70fc626fd62c897e153b806f79dba9b9))
* **release:** 0.10.0-beta.4 [skip ci] ([548bff9](https://github.com/VinciGit00/Scrapegraph-ai/commit/548bff9d77c8b4d2aadee40e966a06cc9d7fd4ab))
* **release:** 0.10.0-beta.5 [skip ci] ([28c9dce](https://github.com/VinciGit00/Scrapegraph-ai/commit/28c9dce7cbda49750172bafd7767fa48a0c33859))
* **release:** 0.10.0-beta.6 [skip ci] ([460d292](https://github.com/VinciGit00/Scrapegraph-ai/commit/460d292af21fabad3fdd2b66110913ccee22ba91))
* **release:** 0.11.0-beta.1 [skip ci] ([63c0dd9](https://github.com/VinciGit00/Scrapegraph-ai/commit/63c0dd93723c2ab55df0a66b555e7fbb4716ea77))
* **release:** 0.11.0-beta.10 [skip ci] ([218b8ed](https://github.com/VinciGit00/Scrapegraph-ai/commit/218b8ede8a22400fd7ba5d1e302ac270f800e67d)), closes [#232](https://github.com/VinciGit00/Scrapegraph-ai/issues/232)
* **release:** 0.11.0-beta.11 [skip ci] ([8727d03](https://github.com/VinciGit00/Scrapegraph-ai/commit/8727d033841b2a30405f12f19f11cd649ffaf4f1))
* **release:** 0.11.0-beta.2 [skip ci] ([7ae50c0](https://github.com/VinciGit00/Scrapegraph-ai/commit/7ae50c035e87be9a3d7b5eef42232dae6e345914))
* **release:** 0.11.0-beta.3 [skip ci] ([106fb12](https://github.com/VinciGit00/Scrapegraph-ai/commit/106fb125316aa3c6dce889963fa423d11bc2c491)), closes [#222](https://github.com/VinciGit00/Scrapegraph-ai/issues/222)
* **release:** 0.11.0-beta.4 [skip ci] ([4ccddda](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ccddda5ebe8d1b12136571733416ed9f819e4db))
* **release:** 0.11.0-beta.5 [skip ci] ([353382b](https://github.com/VinciGit00/Scrapegraph-ai/commit/353382b4d33511259f28afd72ef08fe8f682b688))
* **release:** 0.11.0-beta.6 [skip ci] ([2724d3d](https://github.com/VinciGit00/Scrapegraph-ai/commit/2724d3dd5f7a7dd308e6d441cd8e7a5e085c30c4))
* **release:** 0.11.0-beta.7 [skip ci] ([f0f7373](https://github.com/VinciGit00/Scrapegraph-ai/commit/f0f73736f75fc28c7bdeb4016ebaca07a40c8c59))
* **release:** 0.11.0-beta.8 [skip ci] ([fa4edb4](https://github.com/VinciGit00/Scrapegraph-ai/commit/fa4edb47033121b81cdcc1c910f0386cba5a2f2e))
* **release:** 0.11.0-beta.9 [skip ci] ([d2877d8](https://github.com/VinciGit00/Scrapegraph-ai/commit/d2877d89e5949a01cc90c80028f58735f1fb522e))

## [0.11.0-beta.11](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.10...v0.11.0-beta.11) (2024-05-14)


### Features

* **omni-search:** added omni search graph and updated docs ([fcb3abb](https://github.com/VinciGit00/Scrapegraph-ai/commit/fcb3abb01d505f634309f9ae3c686bbcaab65107))
* **gpt-4o:** image to text single node test ([90955ca](https://github.com/VinciGit00/Scrapegraph-ai/commit/90955ca52f1e3277072e843fb8d578deea27d09f))
* **omni-scraper:** working OmniScraperGraph with images ([a296927](https://github.com/VinciGit00/Scrapegraph-ai/commit/a2969276245cbedb97741975ea707dab2695f71e))


### Bug Fixes

* **fetch_node:** bug in handling local files ([a6e1813](https://github.com/VinciGit00/Scrapegraph-ai/commit/a6e1813ddd36cc8d7c915e6ea0525835d64d10a2))

## [0.11.0-beta.10](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.9...v0.11.0-beta.10) (2024-05-14)


Expand Down
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,6 @@ Just say which information you want to extract and the library will do it for yo
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapegraphai_logo.png" alt="Scrapegraph-ai Logo" style="width: 50%;">
</p>

[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/gkxQDAjfeX)
[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)


## 🚀 Quick install

The reference page for Scrapegraph-ai is available on the official page of pypy: [pypi](https://pypi.org/project/scrapegraphai/).
Expand All @@ -44,13 +39,11 @@ Try it directly on the web using Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)

Follow the procedure on the following link to setup your OpenAI API key: [link](https://scrapegraph-ai.readthedocs.io/en/latest/index.html).

## 📖 Documentation

The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).

Check out also the docusaurus [documentation](https://scrapegraph-doc.onrender.com/).
Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).

## 💻 Usage
There are three main scraping pipelines that can be used to extract information from a website (or local file):
Expand Down Expand Up @@ -179,6 +172,10 @@ Feel free to contribute and join our Discord server to discuss with us improveme

Please see the [contributing guidelines](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md).

[![My Skills](https://skillicons.dev/icons?i=discord)](https://discord.gg/gkxQDAjfeX)
[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)

## 📈 Roadmap
Check out the project roadmap [here](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/README.md)! 🚀

Expand Down
Binary file added docs/assets/omniscrapergraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/omnisearchgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ The library is available on PyPI, so it can be installed using the following com

It is higly recommended to install the library in a virtual environment (conda, venv, etc.)

If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
If you clone the repository, you can install the library using `rye <https://rye-up.com/>`_. Follow the installation instruction from the website and then run:

.. code-block:: bash

poetry install
rye pin 3.10
rye sync
rye build

Additionally on Windows when using WSL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
2 changes: 2 additions & 0 deletions docs/source/scrapers/graph_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Some interesting ones are:
- `headless`: If set to `False`, the web browser will be opened on the URL requested and close right after the HTML is fetched.
- `max_results`: The maximum number of results to be fetched from the search engine. Useful in `SearchGraph`.
- `output_path`: The path where the output files will be saved. Useful in `SpeechGraph`.
- `loader_kwargs`: A dictionary with additional parameters to be passed to the `Loader` class, such as `proxy`.
- `max_images`: The maximum number of images to be analyzed. Useful in `OmniScraperGraph` and `OmniSearchGraph`.

Proxy Rotation
^^^^^^^^^^^^^^
Expand Down
66 changes: 65 additions & 1 deletion docs/source/scrapers/graphs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,80 @@ Graphs

Graphs are scraping pipelines aimed at solving specific tasks. They are composed by nodes which can be configured individually to address different aspects of the task (fetching data, extracting information, etc.).

There are currently three types of graphs available in the library:
There are three types of graphs available in the library:

- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information from using LLM.
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).

With the introduction of `GPT-4o`, two new powerful graphs have been created:

- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
- **OmniSearchGraph**: similar to `SearchGraph`, but with the ability to scrape images and describe them.

.. note::

They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.

OmniScraperGraph
^^^^^^^^^^^^^^^^

.. image:: ../../assets/omniscrapergraph.png
:align: center
:width: 90%
:alt: OmniScraperGraph
|

First we define the graph configuration, which includes the LLM model and other parameters. Then we create an instance of the OmniScraperGraph class, passing the prompt, source, and configuration as arguments. Finally, we run the graph and print the result.
It will fetch the data from the source and extract the information based on the prompt in JSON format.

.. code-block:: python

from scrapegraphai.graphs import OmniScraperGraph

graph_config = {
"llm": {...},
}

omni_scraper_graph = OmniScraperGraph(
prompt="List me all the projects with their titles and image links and descriptions.",
source="https://perinim.github.io/projects",
config=graph_config
)

result = omni_scraper_graph.run()
print(result)

OmniSearchGraph
^^^^^^^^^^^^^^^

.. image:: ../../assets/omnisearchgraph.png
:align: center
:width: 80%
:alt: OmniSearchGraph
|

Similar to OmniScraperGraph, we define the graph configuration, create multiple of the OmniSearchGraph class, and run the graph.
It will create a search query, fetch the first n results from the search engine, run n OmniScraperGraph instances, and return the results in JSON format.

.. code-block:: python

from scrapegraphai.graphs import OmniSearchGraph

graph_config = {
"llm": {...},
}

# Create the OmniSearchGraph instance
omni_search_graph = OmniSearchGraph(
prompt="List me all Chioggia's famous dishes and describe their pictures.",
config=graph_config
)

# Run the graph
result = omni_search_graph.run()
print(result)

SmartScraperGraph
^^^^^^^^^^^^^^^^^

Expand Down
Loading