Skip to content

Pre/beta #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 97 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
e7d39a5
fixed gemini embeddings
VinciGit00 May 8, 2024
4ed0fb8
feat: update info
VinciGit00 May 8, 2024
8272d73
add tokenizatio for mxbai-embed-large
VinciGit00 May 8, 2024
4039793
Update abstract_graph.py
VinciGit00 May 9, 2024
9415675
Update abstract_graph.py
VinciGit00 May 9, 2024
7e00c14
Merge pull request #183 from VinciGit00/182-googlegenerativeaiembeddi…
PeriniM May 9, 2024
ad32298
ci(release): 0.10.0-beta.3 [skip ci]
semantic-release-bot May 9, 2024
a37fbbc
fix: limit python version to < 3.12
daniele-roncaglioni May 9, 2024
590aab7
Merge pull request #193 from daniele-roncaglioni/189-poetry-python-ve…
VinciGit00 May 9, 2024
f10f3b1
feat: Add support for passing pdf path as source
shkamboj1 May 9, 2024
905b345
Merge pull request #4 from shkamboj1/pre/beta
shkamboj1 May 9, 2024
a1d580c
Merge pull request #195 from shorthills-ai/pre/beta
VinciGit00 May 9, 2024
84e8d12
update lock
VinciGit00 May 9, 2024
548bff9
ci(release): 0.10.0-beta.4 [skip ci]
semantic-release-bot May 9, 2024
324e977
fix: fixed bugs for csv and xml
VinciGit00 May 9, 2024
c32caad
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 May 9, 2024
28c9dce
ci(release): 0.10.0-beta.5 [skip ci]
semantic-release-bot May 9, 2024
0ab31c3
fix: add json integration
VinciGit00 May 9, 2024
460d292
ci(release): 0.10.0-beta.6 [skip ci]
semantic-release-bot May 9, 2024
f8ce3d5
fix: Augment the information getting fetched from a webpage
mayurdb May 10, 2024
99adc97
Merge branch 'pre/beta' into fetchNodeFix
VinciGit00 May 10, 2024
4e62689
Merge pull request #203 from mayurdb/fetchNodeFix
VinciGit00 May 10, 2024
63c0dd9
ci(release): 0.11.0-beta.1 [skip ci]
semantic-release-bot May 10, 2024
864aa91
feat: revert fetch_node
PeriniM May 10, 2024
7ae50c0
ci(release): 0.11.0-beta.2 [skip ci]
semantic-release-bot May 10, 2024
2f4fd45
fix(pytest): add dependency for mocking testing functions
DiTo97 May 10, 2024
db2234b
feat(webdriver-backend): add dynamic import scripts from module and file
DiTo97 May 10, 2024
2170131
feat(proxy-rotation): add parse (IP address) or search (from broker) …
DiTo97 May 10, 2024
768719c
feat(safe-web-driver): enchanced the original `AsyncChromiumLoader` w…
DiTo97 May 10, 2024
fc2aa3a
Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai i…
DiTo97 May 10, 2024
67d8fec
Minor typo fix for clarity
epage480 May 10, 2024
627cbee
feat(parallel-exeuction): add asyncio event loop dispatcher with sema…
DiTo97 May 10, 2024
4088474
Added parse_html option in parse_node
epage480 May 10, 2024
aac51ba
Removed dead code, allows GenerateScraperNode to generate scraper with
epage480 May 10, 2024
24c3b05
Removed nonfunctional RAG node from ScriptCreatorGraph
epage480 May 10, 2024
0683e78
Merge branch 'pre/beta' into fix-GenerateScraperGraph
epage480 May 10, 2024
300fd5d
Fetch links in the page while parsing html
mayurdb May 11, 2024
1fa77e5
Merge pull request #215 from epage480/fix-GenerateScraperGraph
VinciGit00 May 11, 2024
b752499
Merge pull request #217 from mayurdb/fetchLinkFix
VinciGit00 May 11, 2024
dc91719
Update cleanup_html.py
VinciGit00 May 11, 2024
b54d984
fix(chromium-loader): ensure it subclasses langchain's base loader
DiTo97 May 11, 2024
13ae918
docs: add diagram showing general structure/flow of the library
daniele-roncaglioni May 11, 2024
df271b6
Add search link node that can find out relevant links in the webpage
mayurdb May 11, 2024
8f1fbe7
minor changes
mayurdb May 11, 2024
ea3b545
Merge branch 'pre/beta' into deepScrape
mayurdb May 11, 2024
9a67a26
Update documentation
mayurdb May 11, 2024
dd29c16
Merge branch 'deepScrape' of github.com:mayurdb/Scrapegraph-ai into d…
mayurdb May 11, 2024
d8ed76b
Merge pull request #221 from mayurdb/deepScrape
VinciGit00 May 11, 2024
b441b30
docs: update overview diagram with more models
daniele-roncaglioni May 11, 2024
3b9ec9b
Merge pull request #220 from daniele-roncaglioni/102-library-overview…
VinciGit00 May 11, 2024
156b67b
feat: add support for deepseek-chat
f-aguzzi May 11, 2024
e004c7c
Merge pull request #223 from f-aguzzi/pre/beta
VinciGit00 May 12, 2024
106fb12
ci(release): 0.11.0-beta.3 [skip ci]
semantic-release-bot May 12, 2024
e2350ed
feat: add new prompt info
VinciGit00 May 12, 2024
f359d5c
Merge pull request #224 from VinciGit00/fixing-prompts
VinciGit00 May 12, 2024
4ccddda
ci(release): 0.11.0-beta.4 [skip ci]
semantic-release-bot May 12, 2024
1e9a564
fix(proxy-rotation): removed duplicated arg and passed the loader_kwa…
PeriniM May 12, 2024
30758b4
Create smart_scarper_deepseek.py
VinciGit00 May 12, 2024
5d6d996
fix(proxy-rotation): removed max_shape duplicate
PeriniM May 13, 2024
e256b75
docs(refactor): added proxy-rotation usage and refactor readthedocs
PeriniM May 13, 2024
0c36a7e
feat: added proxy rotation
PeriniM May 13, 2024
7e8acd8
Merge branch 'pre/beta' into fix/fetch-node-proxybroker
PeriniM May 13, 2024
b8079f8
Merge pull request #211 from DiTo97/fix/fetch-node-proxybroker
PeriniM May 13, 2024
353382b
ci(release): 0.11.0-beta.5 [skip ci]
semantic-release-bot May 13, 2024
0c15947
fix(fetch-node): removed isSoup from default
PeriniM May 13, 2024
2724d3d
ci(release): 0.11.0-beta.6 [skip ci]
semantic-release-bot May 13, 2024
c7ec114
docs(refactor): changed example
PeriniM May 13, 2024
60ed80f
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
PeriniM May 13, 2024
7c91f9f
add examples for deepseek
VinciGit00 May 13, 2024
39be38f
Fixed anthropic/bedrock conflict; Removed duplicate class Claude; Upd…
JGalego May 13, 2024
d0167de
fix: bug for claude
VinciGit00 May 13, 2024
f0f7373
ci(release): 0.11.0-beta.7 [skip ci]
semantic-release-bot May 13, 2024
f3d44c0
Merge pull request #228 from JGalego/fix/bedrock-support
VinciGit00 May 13, 2024
dedc733
fix(asyncio): replaced deepcopy with copy due to serialization problems
PeriniM May 13, 2024
859c5d5
Refactored to include custom AWS client for bedrock; Added missing An…
JGalego May 13, 2024
28ab8da
Merge pull request #229 from JGalego/feat/custom-aws-creds
VinciGit00 May 13, 2024
c0d26d6
ad bedrocl
VinciGit00 May 13, 2024
a8d5e7d
feat(batchsize): tested different batch sizes and systems
PeriniM May 13, 2024
367dea5
Merge branch 'pre/beta' into feat/parallel-node-execution
PeriniM May 13, 2024
62a74a5
Merge pull request #213 from DiTo97/feat/parallel-node-execution
PeriniM May 13, 2024
fa4edb4
ci(release): 0.11.0-beta.8 [skip ci]
semantic-release-bot May 13, 2024
ced2bbc
docs(concurrent): refactor theme and added benchmarck searchgraph
PeriniM May 14, 2024
4fd8a39
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
PeriniM May 14, 2024
d6f5ca8
Merge branch 'main' into pre/beta
VinciGit00 May 14, 2024
5914fa8
Update poetry.lock
VinciGit00 May 14, 2024
d2877d8
ci(release): 0.11.0-beta.9 [skip ci]
semantic-release-bot May 14, 2024
52a4a3b
feat: add gpt-4o
f-aguzzi May 14, 2024
8e46799
Merge pull request #235 from f-aguzzi/pre/beta
PeriniM May 14, 2024
218b8ed
ci(release): 0.11.0-beta.10 [skip ci]
semantic-release-bot May 14, 2024
90955ca
feat(gpt-4o): image to text single node test
PeriniM May 14, 2024
a296927
feat(omni-scraper): working OmniScraperGraph with images
PeriniM May 14, 2024
fcb3abb
feat(omni-search): added omni search graph and updated docs
PeriniM May 14, 2024
a6e1813
fix(fetch_node): bug in handling local files
PeriniM May 14, 2024
a458ec4
Update the prompt for the search_link_node
mayurdb May 14, 2024
d76badd
Merge pull request #239 from mayurdb/deepScrapeFix
VinciGit00 May 14, 2024
932df8d
Merge pull request #238 from VinciGit00/gpt4-omni
VinciGit00 May 14, 2024
8727d03
ci(release): 0.11.0-beta.11 [skip ci]
semantic-release-bot May 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 154 additions & 26 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,189 @@
## [0.10.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.10.1) (2024-05-11)
## [0.11.0-beta.11](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.10...v0.11.0-beta.11) (2024-05-14)


### Features

* **omni-search:** added omni search graph and updated docs ([fcb3abb](https://github.com/VinciGit00/Scrapegraph-ai/commit/fcb3abb01d505f634309f9ae3c686bbcaab65107))
* **gpt-4o:** image to text single node test ([90955ca](https://github.com/VinciGit00/Scrapegraph-ai/commit/90955ca52f1e3277072e843fb8d578deea27d09f))
* **omni-scraper:** working OmniScraperGraph with images ([a296927](https://github.com/VinciGit00/Scrapegraph-ai/commit/a2969276245cbedb97741975ea707dab2695f71e))


### Bug Fixes

* **fetch_node:** bug in handling local files ([a6e1813](https://github.com/VinciGit00/Scrapegraph-ai/commit/a6e1813ddd36cc8d7c915e6ea0525835d64d10a2))

## [0.11.0-beta.10](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.9...v0.11.0-beta.10) (2024-05-14)


### Features

* add gpt-4o ([52a4a3b](https://github.com/VinciGit00/Scrapegraph-ai/commit/52a4a3b22d6871b14801a5edbd28aa32a1a2580d)), closes [#232](https://github.com/VinciGit00/Scrapegraph-ai/issues/232)

## [0.11.0-beta.9](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.8...v0.11.0-beta.9) (2024-05-14)


### Bug Fixes

* crash asyncio due dependency version ([2563773](https://github.com/VinciGit00/Scrapegraph-ai/commit/25637734479a0da293860cf404a618eb5f49c7e2))


### chore

* update models_tokens.py with new model configurations ([d9752b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/d9752b1619c6f86fdc407c898c8c9b443a50cb07))


### Docs

* fixed speechgraphexample ([4bf90f3](https://github.com/VinciGit00/Scrapegraph-ai/commit/4bf90f32a8fbb5a06279ec3002200961458a1250))
* fixed unused param and install ([cc28d5a](https://github.com/VinciGit00/Scrapegraph-ai/commit/cc28d5a64f6e0e061f697262302403db875bc6fe))
* **readme:** improve main readme ([ae5655f](https://github.com/VinciGit00/Scrapegraph-ai/commit/ae5655fdde810e80d20d7918b0b2232e29ee3f56))
* **concurrent:** refactor theme and added benchmarck searchgraph ([ced2bbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/ced2bbcdc9672396e3c8afdc1f7f65c4194d29fd))
* update instructions to use with LocalAI ([198420c](https://github.com/VinciGit00/Scrapegraph-ai/commit/198420c505544c88805e719e2fc864f061c7de05))
* Update README.md ([772e064](https://github.com/VinciGit00/Scrapegraph-ai/commit/772e064c55f38ea296511f737dec9a412e0dbf4e))
* updated sponsor logo ([f8d8d71](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8d8d71589ffc9ccde13259b50d309c7949beeb8))

## [0.10.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0...v0.10.0) (2024-05-08)

### CI

* **release:** 0.10.1 [skip ci] ([d359814](https://github.com/VinciGit00/Scrapegraph-ai/commit/d359814c4a640aa1e3bcde3f3bb3688b03f608d9))

## [0.11.0-beta.8](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.7...v0.11.0-beta.8) (2024-05-13)


### Features

* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
* add new hugging_face models ([d5547a4](https://github.com/VinciGit00/Scrapegraph-ai/commit/d5547a450ccd8908f1cf73707142b3481fbc6baa))
* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))
* **parallel-exeuction:** add asyncio event loop dispatcher with semaphore for parallel graph instances ([627cbee](https://github.com/VinciGit00/Scrapegraph-ai/commit/627cbeeb2096eb4cd5da45015d37fceb7fe7840a))
* **batchsize:** tested different batch sizes and systems ([a8d5e7d](https://github.com/VinciGit00/Scrapegraph-ai/commit/a8d5e7db050e15306780ffca47f998ebaf5c1216))


### Bug Fixes

* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))
* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](https://github.com/VinciGit00/Scrapegraph-ai/commit/6b71ec1d2be953220b6767bc429f4cf6529803fd))
* **examples:** openai std examples ([186c0d0](https://github.com/VinciGit00/Scrapegraph-ai/commit/186c0d035d1d211aff33c38c449f2263d9716a07))
* removed .lock file for deployment ([d4c7d4e](https://github.com/VinciGit00/Scrapegraph-ai/commit/d4c7d4e7fcc2110beadcb2fc91efc657ec6a485c))
* **asyncio:** replaced deepcopy with copy due to serialization problems ([dedc733](https://github.com/VinciGit00/Scrapegraph-ai/commit/dedc73304755c2d540a121d143173f60fb448bbb))

## [0.11.0-beta.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.6...v0.11.0-beta.7) (2024-05-13)


### Bug Fixes

* bug for claude ([d0167de](https://github.com/VinciGit00/Scrapegraph-ai/commit/d0167dee71779a3c1e1e042e17a41134b93b3c78))


### Docs

* update README.md ([17ec992](https://github.com/VinciGit00/Scrapegraph-ai/commit/17ec992b498839e001277e7bc3f0ebea49fbd00d))
* **refactor:** changed example ([c7ec114](https://github.com/VinciGit00/Scrapegraph-ai/commit/c7ec114274da64f0b61cee80afe908a36ad26b78))

## [0.11.0-beta.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.5...v0.11.0-beta.6) (2024-05-13)


### Bug Fixes

* **fetch-node:** removed isSoup from default ([0c15947](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c1594737f878ed5672f4c889fdf9b4e0d7ec49a))

## [0.11.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.4...v0.11.0-beta.5) (2024-05-13)


### Features

* **webdriver-backend:** add dynamic import scripts from module and file ([db2234b](https://github.com/VinciGit00/Scrapegraph-ai/commit/db2234bf5d2f2589b080cd4136f33c4f4443bdfb))
* **proxy-rotation:** add parse (IP address) or search (from broker) functionality for proxy rotation ([2170131](https://github.com/VinciGit00/Scrapegraph-ai/commit/217013181da06abe8d71d9db70e809ea4ebd8236))
* added proxy rotation ([0c36a7e](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c36a7ec1f32ee073d9e0f534a2cb97aba3d7a1f))
* **safe-web-driver:** enchanced the original `AsyncChromiumLoader` web driver with proxy protection and flexible kwargs and backend ([768719c](https://github.com/VinciGit00/Scrapegraph-ai/commit/768719cce80953fa6cbe283e442420116c438f16))


### Bug Fixes

* **pytest:** add dependency for mocking testing functions ([2f4fd45](https://github.com/VinciGit00/Scrapegraph-ai/commit/2f4fd45700ebf1db0c429b5a6249386d1a111615))
* **chromium-loader:** ensure it subclasses langchain's base loader ([b54d984](https://github.com/VinciGit00/Scrapegraph-ai/commit/b54d984c134c8cbc432fd111bb161d3d53cf4a85))
* **proxy-rotation:** removed duplicated arg and passed the loader_kwarhs correctly to the node ([1e9a564](https://github.com/VinciGit00/Scrapegraph-ai/commit/1e9a56461632999c5dc09f5aa930c14c954025ad))
* **proxy-rotation:** removed max_shape duplicate ([5d6d996](https://github.com/VinciGit00/Scrapegraph-ai/commit/5d6d996e8f6132101d4c3af835d74f0674baffa1))


### Docs

* **refactor:** added proxy-rotation usage and refactor readthedocs ([e256b75](https://github.com/VinciGit00/Scrapegraph-ai/commit/e256b758b2ada641f97b23b1cf6c6b0174563d8a))

## [0.11.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.3...v0.11.0-beta.4) (2024-05-12)


### Features

* add new prompt info ([e2350ed](https://github.com/VinciGit00/Scrapegraph-ai/commit/e2350eda6249d8e121344d12c92645a3887a5b76))

## [0.11.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.2...v0.11.0-beta.3) (2024-05-12)


### Features

* add support for deepseek-chat ([156b67b](https://github.com/VinciGit00/Scrapegraph-ai/commit/156b67b91e1798f67082123e2c0087d358a32d4d)), closes [#222](https://github.com/VinciGit00/Scrapegraph-ai/issues/222)


### Docs

* add diagram showing general structure/flow of the library ([13ae918](https://github.com/VinciGit00/Scrapegraph-ai/commit/13ae9180ac5e7ef11dad1a210cf8790e797397dd))
* update overview diagram with more models ([b441b30](https://github.com/VinciGit00/Scrapegraph-ai/commit/b441b30a5c60dda105964f69bd4cef06825f5c74))

## [0.11.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.1...v0.11.0-beta.2) (2024-05-10)


### Features

* revert fetch_node ([864aa91](https://github.com/VinciGit00/Scrapegraph-ai/commit/864aa91326c360992326e04811d272e55eac8355))

## [0.11.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.11.0-beta.1) (2024-05-10)


### Features

* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))
* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))


### Bug Fixes

* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))
* Augment the information getting fetched from a webpage ([f8ce3d5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8ce3d5916eab926275d59d4d48b0d89ec9cd43f))
* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))
* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))


### CI

* **release:** 0.10.0-beta.1 [skip ci] ([c47a505](https://github.com/VinciGit00/Scrapegraph-ai/commit/c47a505750ee63e0220b339478953155ef1f1771))
* **release:** 0.10.0-beta.2 [skip ci] ([3f0e069](https://github.com/VinciGit00/Scrapegraph-ai/commit/3f0e0694f3b08463f025586777f7c0594b5ecb14))
* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](https://github.com/VinciGit00/Scrapegraph-ai/commit/5aa600cb0a85d320ad8dc786af26ffa46dd4d097))
* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](https://github.com/VinciGit00/Scrapegraph-ai/commit/da8c72ce138bcfe2627924d25a67afcd22cfafd5))
* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c5397f67a9f05e0c00f631dd297b5527263a888))
* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](https://github.com/VinciGit00/Scrapegraph-ai/commit/532adb639d58640bc89e8b162903b2ed97be9853))
* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c0b46eb40b446b270c665c11b2c6508f4d5f4be))
* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](https://github.com/VinciGit00/Scrapegraph-ai/commit/6911e21584767460c59c5a563c3fd010857cbb67))
* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](https://github.com/VinciGit00/Scrapegraph-ai/commit/739aaa33c39c12e7ab7df8a0656cad140b35c9db))
* **release:** 0.10.0-beta.3 [skip ci] ([ad32298](https://github.com/VinciGit00/Scrapegraph-ai/commit/ad32298e70fc626fd62c897e153b806f79dba9b9))
* **release:** 0.10.0-beta.4 [skip ci] ([548bff9](https://github.com/VinciGit00/Scrapegraph-ai/commit/548bff9d77c8b4d2aadee40e966a06cc9d7fd4ab))
* **release:** 0.10.0-beta.5 [skip ci] ([28c9dce](https://github.com/VinciGit00/Scrapegraph-ai/commit/28c9dce7cbda49750172bafd7767fa48a0c33859))
* **release:** 0.10.0-beta.6 [skip ci] ([460d292](https://github.com/VinciGit00/Scrapegraph-ai/commit/460d292af21fabad3fdd2b66110913ccee22ba91))

### Bug Fixes

* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))

## [0.10.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.4...v0.10.0-beta.5) (2024-05-09)



### Bug Fixes


* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))

## [0.10.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.3...v0.10.0-beta.4) (2024-05-09)


### Features

* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))


### Bug Fixes

* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))

## [0.10.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.2...v0.10.0-beta.3) (2024-05-09)


### Features

* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))

## [0.10.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.1...v0.10.0-beta.2) (2024-05-08)

Expand Down
Binary file added docs/assets/omniscrapergraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/omnisearchgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/project_overview_diagram.fig
Binary file not shown.
Binary file added docs/assets/project_overview_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/searchgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/smartscrapergraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/speechgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 22 additions & 5 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,36 @@
# import all the modules
sys.path.insert(0, os.path.abspath('../../'))

project = 'scrapegraphai'
copyright = '2024, Marco Vinciguerra'
author = 'Marco Vinciguerra'
project = 'ScrapeGraphAI'
copyright = '2024, ScrapeGraphAI'
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'

html_last_updated_fmt = "%b %d, %Y"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']

templates_path = ['_templates']
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
# html_theme = 'sphinx_rtd_theme'
html_theme = 'sphinx_wagtail_theme'

html_theme_options = dict(
project_name = "ScrapeGraphAI",
logo = "scrapegraphai_logo.png",
logo_alt = "ScrapeGraphAI",
logo_height = 59,
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
logo_width = 45,
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
footer_links = ",".join(
["Landing Page|https://scrapegraphai.com/",
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
),
)
7 changes: 5 additions & 2 deletions docs/source/getting_started/examples.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Examples
========

Here some example of the different ways to scrape with ScrapegraphAI
Let's suppose you want to scrape a website to get a list of projects with their descriptions.
You can use the `SmartScraperGraph` class to do that.
The following examples show how to use the `SmartScraperGraph` class with OpenAI models and local models.

OpenAI models
^^^^^^^^^^^^^
Expand Down Expand Up @@ -78,7 +80,7 @@ After that, you can run the following code, using only your machine resources br
# ************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the news with their description.",
prompt="List me all the projects with their description.",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config
Expand All @@ -87,3 +89,4 @@ After that, you can run the following code, using only your machine resources br
result = smart_scraper_graph.run()
print(result)

To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!
21 changes: 16 additions & 5 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,37 @@ for this project.
Prerequisites
^^^^^^^^^^^^^

- `Python 3.8+ <https://www.python.org/downloads/>`_
- `Python >=3.9,<3.12 <https://www.python.org/downloads/>`_
- `pip <https://pip.pypa.io/en/stable/getting-started/>`_
- `ollama <https://ollama.com/>`_ *optional for local models
- `Ollama <https://ollama.com/>`_ (optional for local models)


Install the library
^^^^^^^^^^^^^^^^^^^^

The library is available on PyPI, so it can be installed using the following command:

.. code-block:: bash

pip install scrapegraphai

.. important::

It is higly recommended to install the library in a virtual environment (conda, venv, etc.)

If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:

.. code-block:: bash

poetry install

Additionally on Windows when using WSL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you are using Windows Subsystem for Linux (WSL) and you are facing issues with the installation of the library, you might need to install the following packages:

.. code-block:: bash

sudo apt-get -y install libnss3 libnspr4 libgbm1 libasound2

As simple as that! You are now ready to scrape gnamgnamgnam 👿👿👿



20 changes: 14 additions & 6 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,6 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to scrapegraphai-ai's documentation!
=======================================

Here you will find all the information you need to get started.
The following sections will guide you through the installation process and the usage of the library.

.. toctree::
:maxdepth: 2
:caption: Introduction
Expand All @@ -22,6 +16,20 @@ The following sections will guide you through the installation process and the u

getting_started/installation
getting_started/examples

.. toctree::
:maxdepth: 2
:caption: Scrapers

scrapers/graphs
scrapers/llm
scrapers/graph_config
scrapers/benchmarks

.. toctree::
:maxdepth: 2
:caption: Modules

modules/modules

Indices and tables
Expand Down
2 changes: 1 addition & 1 deletion docs/source/introduction/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Contributing
============

Hey, you want to contribute? Awesome!
Just fork the repo, make your changes, and send me a pull request.
Just fork the repo, make your changes, and send a pull request.
If you're not sure if it's a good idea, open an issue and we'll discuss it.

Go and check out the `contributing guidelines <https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md>`__ for more information.
Expand Down
Loading