You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -22,20 +22,18 @@
22
22
-[Intro](#intro)
23
23
-[Getting Started](#getting-started)
24
24
-[API Reference](#api-reference)
25
-
-[Stagehand()](#stagehand)
26
25
-[act()](#act)
27
26
-[extract()](#extract)
28
27
-[observe()](#observe)
29
-
-[page and context](#page-and-context)
30
-
-[log()](#log)
31
28
-[Model Support](#model-support)
32
29
-[How It Works](#how-it-works)
33
30
-[Roadmap](#roadmap)
34
31
-[Contributing](#contributing)
35
32
-[Acknowledgements](#acknowledgements)
36
33
-[License](#license)
37
34
38
-
> [!NOTE] > `Stagehand` is currently available as an early release, and we're actively seeking feedback from the community. Please join our [Slack community](https://join.slack.com/t/stagehand-dev/shared_invite/zt-2tdncfgkk-fF8y5U0uJzR2y2_M9c9OJA) to stay updated on the latest developments and provide feedback.
35
+
> [!NOTE]
36
+
> `Stagehand` is currently available as an early release, and we're actively seeking feedback from the community. Please join our [Slack community](https://join.slack.com/t/stagehand-dev/shared_invite/zt-2tdncfgkk-fF8y5U0uJzR2y2_M9c9OJA) to stay updated on the latest developments and provide feedback.
39
37
40
38
## Intro
41
39
@@ -173,7 +171,7 @@ This constructor is used to create an instance of Stagehand.
173
171
174
172
-`action`: a `string` describing the action to perform, e.g., `"search for 'x'"`.
175
173
-`modelName`: (optional) an `AvailableModel` string to specify the model to use.
176
-
-`useVision`: (optional) a `boolean` or `"fallback"` to determine if vision-based processing should be used.
174
+
-`useVision`: (optional) a `boolean` or `"fallback"` to determine if vision-based processing should be used. Defaults to `"fallback"`.
177
175
178
176
-**Returns:**
179
177
@@ -222,6 +220,7 @@ If you are looking for a specific element, you can also pass in an instruction t
222
220
-**Arguments:**
223
221
224
222
-`instruction`: a `string` providing instructions for the observation.
223
+
-`useVision`: (optional) a `boolean` or `"fallback"` to determine if vision-based processing should be used. Defaults to `"fallback"`.
225
224
226
225
-**Returns:**
227
226
@@ -295,9 +294,6 @@ The SDK has two major phases:
295
294
### DOM processing
296
295
297
296
Stagehand uses a combination of techniques to prepare the DOM.
298
-
Stagehand only uses text input as of this version, but the release of `gpt-4o` incorporating vision is attractive.
299
-
300
-
\*_[update before release_]\*
301
297
302
298
The DOM Processing steps look as follows:
303
299
@@ -316,6 +312,10 @@ While LLMs will continue to get bigger context windows and improve latency, givi
316
312
317
313

318
314
315
+
### Vision
316
+
317
+
The `act()` and `observe()` methods can take a `useVision` flag. If this is set to `true`, the LLM will be provided with a annotated screenshot of the current page to identify which elements to act on. This is useful for complex DOMs that the LLM has a hard time reasoning about, even after processing and chunking. By default, this flag is set to `"fallback"`, which means that if the LLM fails to successfully identify a single element, Stagehand will retry the attempt using vision.
318
+
319
319
### LLM analysis
320
320
321
321
Now we have a list of candidate elements and a way to select them. We can present those elements with additional context to the LLM for extraction or action. While untested at on a large scale, presenting a "numbered list of elements" guides the model to not treat the context as a full DOM, but as a list of related but independent elements to operate on.
0 commit comments