droid-cua

Minimal AI agent that controls Android devices using OpenAI’s computer-use-preview model.

cua-droid-duolingo-demo.mp4

🚀 How It Works

Connects to a running Android emulator.
Captures full-screen device screenshots.
Scales down the screenshots for OpenAI model compatibility.
Sends screenshots and user instructions to OpenAI’s computer-use-preview model.
Receives structured actions (click, scroll, type, keypress, wait, drag).
Rescales model outputs back to real device coordinates.
Executes the actions on the device.
Repeats until you type exit.

🛠 Setup

Install dependencies:
```
npm install
```

Create a .env file with your OpenAI API key:

echo "OPENAI_API_KEY=your-api-key" > .env

Make sure Android Debug Bridge (ADB) is available in your system PATH:
```
adb version
```
Start your Android emulator manually (optional):
```
emulator -avd Your_AVD_Name
```
Run the agent:
```
node index.js --avd=Your_AVD_Name
```
If no --avd is provided, the agent will try to connect to the first running device.

🧠 Features

Captures screenshots directly from the device (adb exec-out screencap -p).
Dynamically scales screenshots for OpenAI compatibility.
Maps model-generated actions (click, scroll, drag, type, keypress, wait) back to real device coordinates.
Connects automatically to a running emulator or launches it if needed.
Pretends the device screen is embedded inside a browser page for environment compatibility.

📄 Command Line Flags

Flag	Description
`--avd=AVD_NAME`	Select the emulator device by AVD name.
`--instructions=FILENAME`	Load user instructions from a text file.
`--record`	Save every screenshot into a folder for later review or video creation.

📋 Example Usage

Start your emulator:

emulator -avd Pixel_5_API_34

Run the agent:

node index.js --avd=Pixel_5_API_34

Run with an instructions file:

node index.js --avd=Pixel_5_API_34 --instructions=example.txt

Example example.txt:

Open Chrome
Search for "Loadmill"
Scroll down
Go back to the home screen
exit

📦 Requirements

Node.js 18 or higher
A running Android emulator (AVD)
Android Debug Bridge (ADB) installed and available in system PATH
OpenAI Tier 3 access for the computer-use-preview model

Note

Your OpenAI account must be Tier 3 to access the computer-use-preview model.
Learn more: OpenAI Computer Use Preview

📁 Project Structure

File	Responsibility
`index.js`	Manages user input, OpenAI conversation, and main loop.
`device.js`	ADB device connection, screenshot capture, screen size management.
`actions.js`	Executes model actions on the device (tap, swipe, drag, type, keypress).
`openai.js`	Sends requests to OpenAI and manages API responses.

🎞️ Convert Screenshots to Video (Optional)

If you run the agent with the --record flag, it saves all screenshots to a folder like:

droid-cua-recording-1715098765432/

You can convert the frames into a video using ffmpeg:

ffmpeg -framerate 1 -pattern_type glob -i 'droid-cua-recording-*/frame_*.png' \
  -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -c:v libx264 -pix_fmt yuv420p session.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

droid-cua

🚀 How It Works

🛠 Setup

🧠 Features

📄 Command Line Flags

📋 Example Usage

📦 Requirements

📁 Project Structure

🎞️ Convert Screenshots to Video (Optional)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
actions.js		actions.js
device.js		device.js
example.txt		example.txt
index.js		index.js
openai.js		openai.js
package-lock.json		package-lock.json
package.json		package.json

loadmill/droid-cua

Folders and files

Latest commit

History

Repository files navigation

droid-cua

🚀 How It Works

🛠 Setup

🧠 Features

📄 Command Line Flags

📋 Example Usage

📦 Requirements

📁 Project Structure

🎞️ Convert Screenshots to Video (Optional)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages