You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-16Lines changed: 22 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -45,14 +45,16 @@ Each Stagehand function takes in an atomic instruction, such as `act("click the
45
45
46
46
Instructions should be atomic to increase reliability, and step planning should be handled by the higher level agent. You can use `observe()` to get a suggested list of actions that can be taken on the current page, and then use those to ground your step planning prompts.
47
47
48
-
Stagehand is [open source](#license) and maintained by [Browserbase](https://browserbase.com) team. We believe that by enabling more developers to build reliable web automations, we'll expand the market of developers who benefit from our headless browser infrastructure. This is the framework that we wished we had while tinkering on our own applications, and we're excited to share it with you.
48
+
Stagehand is [open source](#license) and maintained by the [Browserbase](https://browserbase.com) team. We believe that by enabling more developers to build reliable web automations, we'll expand the market of developers who benefit from our headless browser infrastructure. This is the framework that we wished we had while tinkering on our own applications, and we're excited to share it with you.
49
49
50
50
## Getting Started
51
51
52
52
### 1. Install the Stagehand package
53
53
54
+
We also install zod to power typed extraction
55
+
54
56
```bash
55
-
npm install @browserbasehq/stagehand
57
+
npm install @browserbasehq/stagehand zod
56
58
```
57
59
58
60
### 2. Configure your model provider
@@ -78,6 +80,7 @@ Then you can create a Stagehand instance like so:
@@ -211,7 +215,8 @@ This constructor is used to create an instance of Stagehand.
211
215
212
216
#### `observe()`
213
217
214
-
> [!NOTE] > `observe()` currently only evaluates the first chunk in the page.
218
+
> [!NOTE]
219
+
> `observe()` currently only evaluates the first chunk in the page.
215
220
216
221
`observe()` is used to get a list of actions that can be taken on the current page. It's useful for adding context to your planning step, or if you unsure of what page you're on.
217
222
@@ -272,17 +277,6 @@ Stagehand currently supports the following models from OpenAI and Anthropic:
272
277
273
278
These models can be specified when initializing the `Stagehand` instance or when calling methods like `act()` and `extract()`.
274
279
275
-
#### Adding a New Model
276
-
277
-
To add a new model to Stagehand, follow these steps:
278
-
279
-
1.**Define the Model**: Add the new model name to the `AvailableModel` type in the `LLMProvider.ts` file. This ensures that the model is recognized by the system.
280
-
281
-
2.**Map the Model to a Provider**: Update the `modelToProviderMap` in the `LLMProvider` class to associate the new model with its corresponding provider. This mapping is crucial for determining which client to use.
282
-
283
-
3.**Implement the Client**: If the new model requires a new client, implement a class that adheres to the `LLMClient` interface. This class should define all necessary methods, such as `createChatCompletion`.
284
-
285
-
4.**Update the `getClient` Method**: Modify the `getClient` method in the `LLMProvider` class to return an instance of the new client when the new model is requested.
286
280
287
281
## How It Works
288
282
@@ -308,7 +302,7 @@ The DOM Processing steps look as follows:
308
302
309
303
#### Chunking
310
304
311
-
While LLMs will continue to get bigger context windows and improve latency, giving any reasoning system less stuff to think about should make it more accurate. As a result, DOM processing is done in chunks in order to keep the context small per inference call. In order to chunk, the SDK considers a candidate element that starts in a section of the viewport to be a part of that chunk. In the future, padding will be added to ensure that an individual chunk does not lack relevant context. See this diagram for how it looks:
305
+
While LLMs will continue to increase context window length and reduce latency, giving any reasoning system less stuff to think about should make it more reliable. As a result, DOM processing is done in chunks in order to keep the context small per inference call. In order to chunk, the SDK considers a candidate element that starts in a section of the viewport to be a part of that chunk. In the future, padding will be added to ensure that an individual chunk does not lack relevant context. See this diagram for how it looks:
312
306
313
307

314
308
@@ -318,7 +312,7 @@ The `act()` and `observe()` methods can take a `useVision` flag. If this is set
318
312
319
313
### LLM analysis
320
314
321
-
Now we have a list of candidate elements and a way to select them. We can present those elements with additional context to the LLM for extraction or action. While untested at on a large scale, presenting a "numbered list of elements" guides the model to not treat the context as a full DOM, but as a list of related but independent elements to operate on.
315
+
Now we have a list of candidate elements and a way to select them. We can present those elements with additional context to the LLM for extraction or action. While untested on a large scale, presenting a "numbered list of elements" guides the model to not treat the context as a full DOM, but as a list of related but independent elements to operate on.
322
316
323
317
In the case of action, we ask the LLM to write a playwright method in order to do the correct thing. In our limited testing, playwright syntax is much more effective than relying on built in javascript APIs, possibly due to tokenization.
324
318
@@ -377,6 +371,18 @@ Running all evals can take some time. We have a convenience script `example.ts`
377
371
378
372
You can run `npm run example` to execute and iterate on the eval you are currently developing.
379
373
374
+
#### Adding a New Model
375
+
376
+
To add a new model to Stagehand, follow these steps:
377
+
378
+
1.**Define the Model**: Add the new model name to the `AvailableModel` type in the `LLMProvider.ts` file. This ensures that the model is recognized by the system.
379
+
380
+
2.**Map the Model to a Provider**: Update the `modelToProviderMap` in the `LLMProvider` class to associate the new model with its corresponding provider. This mapping is crucial for determining which client to use.
381
+
382
+
3.**Implement the Client**: If the new model requires a new client, implement a class that adheres to the `LLMClient` interface. This class should define all necessary methods, such as `createChatCompletion`.
383
+
384
+
4.**Update the `getClient` Method**: Modify the `getClient` method in the `LLMProvider` class to return an instance of the new client when the new model is requested.
385
+
380
386
### Building the SDK
381
387
382
388
Stagehand uses [tsup](https://github.com/egoist/tsup) to build the SDK and vanilla `esbuild` to build scripts that run in the DOM.
0 commit comments