Skip to content

Commit 6e62fa7

Browse files
jeremypresspkiv
andauthored
readme tweaks (#137)
* readme tweaks * Update README.md --------- Co-authored-by: Paul Klein <paulleoklein@gmail.com>
1 parent 9e477f7 commit 6e62fa7

File tree

1 file changed

+22
-16
lines changed

1 file changed

+22
-16
lines changed

README.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,16 @@ Each Stagehand function takes in an atomic instruction, such as `act("click the
4545

4646
Instructions should be atomic to increase reliability, and step planning should be handled by the higher level agent. You can use `observe()` to get a suggested list of actions that can be taken on the current page, and then use those to ground your step planning prompts.
4747

48-
Stagehand is [open source](#license) and maintained by [Browserbase](https://browserbase.com) team. We believe that by enabling more developers to build reliable web automations, we'll expand the market of developers who benefit from our headless browser infrastructure. This is the framework that we wished we had while tinkering on our own applications, and we're excited to share it with you.
48+
Stagehand is [open source](#license) and maintained by the [Browserbase](https://browserbase.com) team. We believe that by enabling more developers to build reliable web automations, we'll expand the market of developers who benefit from our headless browser infrastructure. This is the framework that we wished we had while tinkering on our own applications, and we're excited to share it with you.
4949

5050
## Getting Started
5151

5252
### 1. Install the Stagehand package
5353

54+
We also install zod to power typed extraction
55+
5456
```bash
55-
npm install @browserbasehq/stagehand
57+
npm install @browserbasehq/stagehand zod
5658
```
5759

5860
### 2. Configure your model provider
@@ -78,6 +80,7 @@ Then you can create a Stagehand instance like so:
7880

7981
```javascript
8082
import { Stagehand } from "@browserbasehq/stagehand";
83+
import { z } from "zod";
8184

8285
const stagehand = new Stagehand({
8386
env: "LOCAL",
@@ -93,6 +96,7 @@ export BROWSERBASE_PROJECT_ID=...
9396

9497
```javascript
9598
import { Stagehand } from "@browserbasehq/stagehand";
99+
import { z } from "zod";
96100

97101
const stagehand = new Stagehand({
98102
env: "BROWSERBASE",
@@ -211,7 +215,8 @@ This constructor is used to create an instance of Stagehand.
211215

212216
#### `observe()`
213217

214-
> [!NOTE] > `observe()` currently only evaluates the first chunk in the page.
218+
> [!NOTE]
219+
> `observe()` currently only evaluates the first chunk in the page.
215220
216221
`observe()` is used to get a list of actions that can be taken on the current page. It's useful for adding context to your planning step, or if you unsure of what page you're on.
217222

@@ -272,17 +277,6 @@ Stagehand currently supports the following models from OpenAI and Anthropic:
272277

273278
These models can be specified when initializing the `Stagehand` instance or when calling methods like `act()` and `extract()`.
274279

275-
#### Adding a New Model
276-
277-
To add a new model to Stagehand, follow these steps:
278-
279-
1. **Define the Model**: Add the new model name to the `AvailableModel` type in the `LLMProvider.ts` file. This ensures that the model is recognized by the system.
280-
281-
2. **Map the Model to a Provider**: Update the `modelToProviderMap` in the `LLMProvider` class to associate the new model with its corresponding provider. This mapping is crucial for determining which client to use.
282-
283-
3. **Implement the Client**: If the new model requires a new client, implement a class that adheres to the `LLMClient` interface. This class should define all necessary methods, such as `createChatCompletion`.
284-
285-
4. **Update the `getClient` Method**: Modify the `getClient` method in the `LLMProvider` class to return an instance of the new client when the new model is requested.
286280

287281
## How It Works
288282

@@ -308,7 +302,7 @@ The DOM Processing steps look as follows:
308302

309303
#### Chunking
310304

311-
While LLMs will continue to get bigger context windows and improve latency, giving any reasoning system less stuff to think about should make it more accurate. As a result, DOM processing is done in chunks in order to keep the context small per inference call. In order to chunk, the SDK considers a candidate element that starts in a section of the viewport to be a part of that chunk. In the future, padding will be added to ensure that an individual chunk does not lack relevant context. See this diagram for how it looks:
305+
While LLMs will continue to increase context window length and reduce latency, giving any reasoning system less stuff to think about should make it more reliable. As a result, DOM processing is done in chunks in order to keep the context small per inference call. In order to chunk, the SDK considers a candidate element that starts in a section of the viewport to be a part of that chunk. In the future, padding will be added to ensure that an individual chunk does not lack relevant context. See this diagram for how it looks:
312306

313307
![](./docs/media/chunks.png)
314308

@@ -318,7 +312,7 @@ The `act()` and `observe()` methods can take a `useVision` flag. If this is set
318312

319313
### LLM analysis
320314

321-
Now we have a list of candidate elements and a way to select them. We can present those elements with additional context to the LLM for extraction or action. While untested at on a large scale, presenting a "numbered list of elements" guides the model to not treat the context as a full DOM, but as a list of related but independent elements to operate on.
315+
Now we have a list of candidate elements and a way to select them. We can present those elements with additional context to the LLM for extraction or action. While untested on a large scale, presenting a "numbered list of elements" guides the model to not treat the context as a full DOM, but as a list of related but independent elements to operate on.
322316

323317
In the case of action, we ask the LLM to write a playwright method in order to do the correct thing. In our limited testing, playwright syntax is much more effective than relying on built in javascript APIs, possibly due to tokenization.
324318

@@ -377,6 +371,18 @@ Running all evals can take some time. We have a convenience script `example.ts`
377371

378372
You can run `npm run example` to execute and iterate on the eval you are currently developing.
379373

374+
#### Adding a New Model
375+
376+
To add a new model to Stagehand, follow these steps:
377+
378+
1. **Define the Model**: Add the new model name to the `AvailableModel` type in the `LLMProvider.ts` file. This ensures that the model is recognized by the system.
379+
380+
2. **Map the Model to a Provider**: Update the `modelToProviderMap` in the `LLMProvider` class to associate the new model with its corresponding provider. This mapping is crucial for determining which client to use.
381+
382+
3. **Implement the Client**: If the new model requires a new client, implement a class that adheres to the `LLMClient` interface. This class should define all necessary methods, such as `createChatCompletion`.
383+
384+
4. **Update the `getClient` Method**: Modify the `getClient` method in the `LLMProvider` class to return an instance of the new client when the new model is requested.
385+
380386
### Building the SDK
381387

382388
Stagehand uses [tsup](https://github.com/egoist/tsup) to build the SDK and vanilla `esbuild` to build scripts that run in the DOM.

0 commit comments

Comments
 (0)