Browser Control Agent

A Chrome extension that enables AI-powered browser automation through natural language commands.

Overview

Browser Control Agent is a Chrome extension that allows you to automate web-based tasks using natural language instructions. Simply tell the agent what you want to accomplish (e.g., "find 10 properties to stay in Chiang Mai in September"), and it will intelligently navigate websites and perform actions on your behalf.

Key Features

Natural Language Control: Interact with your browser using everyday language
Multimodal Understanding: The agent processes both visual and textual content to understand web pages
Adaptive Automation: Intelligently handles dynamic web content through an iterative decision-making process
Privacy-Focused: All processing happens client-side within your browser

How It Works

You provide a high-level goal through the side panel chat interface
The AI agent analyzes the current webpage using text and screenshots
It decides on the appropriate action (clicking, typing, scrolling, etc.)
After executing the action, it re-evaluates the page state and determines the next step
This loop continues until your goal is achieved

Technology

Browser Control Agent leverages Google's Gemini 1.5 multimodal language model to understand web content and determine the most appropriate actions to take.

Installation

Clone this repository
Install dependencies: npm install
Build the extension: npm run build
Load the extension in Chrome:
- Go to chrome://extensions/
- Enable "Developer mode"
- Click "Load unpacked" and select the dist folder

Usage

Click the Browser Control Agent icon in your Chrome toolbar to open the side panel
Enter your API key in the options page (accessible via the side panel)
Navigate to a website where you want to automate tasks
Enter your goal in the chat interface and watch as the agent works for you

License

This project is licensed under the MIT License - see the LICENSE file for details.

Documentation

For more detailed information about the project architecture and implementation, see the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ai-context		ai-context
docs		docs
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
manifest.json		manifest.json
options.html		options.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Control Agent

Overview

Key Features

How It Works

Technology

Installation

Usage

License

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dhamaniasad/browser-control

Folders and files

Latest commit

History

Repository files navigation

Browser Control Agent

Overview

Key Features

How It Works

Technology

Installation

Usage

License

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages