An open-source prototype exploring how LLM agents can act as context-aware UI adjusters to improve accessibility for visually impaired users. The system enables real-time interface adaptation via natural conversation and ambient context, and demonstrates how agentic tool use can personalize experiences on everyday information terminals (e.g., kiosks, multimedia guides, wearable displays).
Abstract: The user interface (UI) of traditional electronic devices often lack intuitive and flexible methods to accommodate the diverse needs of visually impaired (VI) users. This paper explores the role of Large Language Model (LLM)-powered Conversational Agents (CAs) serving as intelligent, context-aware mediators to achieve VI-friendly adaptive UI. Our approach focuses on facilitating real-time UI adjustments based on human-AI conversations and device and ambient context, leveraging LLM CAs' reasoning and tool-using capabilities to accommodate different types of visual impairments. A prototype was developed to demonstrate how the system can dynamically adapt the interface. Our initial observations suggest that this approach has the potential to promote a seamless and inclusive experience for VI users, particularly on daily information terminals that are not flexible or VI-friendly in terms of quickly adjusting UI settings, such as kiosks, multimedia guides, and wearable displays.
Paper link: LLM‑Driven Adaptive UI for the Visually Impaired
- Agent Orchestration: A scheduler agent routes requests to specialized agents:
- UI Adjuster: updates background/text colors, font sizes, layout, and cursor accessibility
- Profile Updater: collects and stores user preferences
- Content Explainer: reads and explains on-screen content in simple language
- Voice In / Voice Out:
- Speech-to-Text with OpenAI Whisper (
whisper-1
) - Text-to-Speech with OpenAI TTS (
tts-1
, voicealloy
)
- Speech-to-Text with OpenAI Whisper (
- Accessible Controls:
- High-contrast themes, large font presets, flexible layouts
- Optional large focus cursor overlay
- Keyboard shortcuts for voice interaction
- Sample News App: Pages for
Home
,Local
,World
, andSettings
to showcase in-context adaptation.
- React 18 + TypeScript, Vite
- Redux Toolkit for state management
- Tailwind CSS (with
tailwind-scrollbar
) - OpenAI SDK (browser) for chat + TTS + STT
- React Router for navigation
- Node.js 18+ (recommended for Vite 5)
- Yarn, npm, or pnpm
- An OpenAI API key with access to
gpt-4o
,tts-1
, andwhisper-1
git clone <this-repo-url>
cd llm-driven-adaptive-ui-for-the-visually-impaired
yarn install
Create a .env
file in the project root:
VITE_OPENAI_API_KEY=sk-...your_key...
Note: The client runs entirely in the browser using the OpenAI SDK with dangerouslyAllowBrowser: true
. See Security notes below before deploying.
yarn dev # start Vite dev server
yarn build
yarn preview
Space
: Start/stop voice interaction (records a voice message or toggles playback)V
: Replay the assistant’s last audio response (when available)
When pressing Space
, any currently selected text on the page is captured to help the Content Explainer agent provide context-aware explanations.
Home
: Sample news listing with images and text regions that the agent can describeLocal
andWorld
: Additional sections used to demonstrate navigation and contextSettings
: Where user profile preferences are summarized/managed by the agent
- Scheduler agent decides which specialized agent to activate
- UI Adjuster agent can call functions to:
- Change background/text color presets (including high-contrast options)
- Increase/decrease font size presets
- Swap layout (main content vs. sidebar positioning)
- Toggle an accessible large cursor overlay
- Profile Updater agent edits: username, interface preference, content preference, scheduling preference
- Content Explainer agent reads and explains content using simplified language
- Chat:
gpt-4o
- Speech-to-Text:
whisper-1
- Text-to-Speech:
tts-1
(voicealloy
)
VITE_OPENAI_API_KEY
(required): OpenAI API key used by the browser client
This demo runs the OpenAI SDK in the browser with dangerouslyAllowBrowser: true
, which exposes your API key to clients. For production use:
- Move API access to a server or edge function and proxy requests