[Feature Request] Add icon descriptions in visual prompt of interactive elements detection

### Required prerequisites

- [X] I have searched the [Issue Tracker](https://github.com/camel-ai/crab/issues) that this hasn't already been reported. (+1 or comment there if it has.)

### Motivation


The current object detection visual prompt (GroundingDino) only finds the icon box. We want to get semantic descriptions for each icon to help agent understand UI.

### Solution

The first step can be using VLLM to generate the description after passing through the object detection.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

Required prerequisites

Motivation

Solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

Description

Required prerequisites

Motivation

Solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions