Skip to content

PhysiologicAILab/llm-driven-adaptive-ui-for-the-visually-impaired

Repository files navigation

LLM‑Driven Adaptive UI for the Visually Impaired

License: MIT React 18 Vite 5 TypeScript Tailwind CSS Redux Toolkit OpenAI API Node.js PRs Welcome

An open-source prototype exploring how LLM agents can act as context-aware UI adjusters to improve accessibility for visually impaired users. The system enables real-time interface adaptation via natural conversation and ambient context, and demonstrates how agentic tool use can personalize experiences on everyday information terminals (e.g., kiosks, multimedia guides, wearable displays).

Abstract: The user interface (UI) of traditional electronic devices often lack intuitive and flexible methods to accommodate the diverse needs of visually impaired (VI) users. This paper explores the role of Large Language Model (LLM)-powered Conversational Agents (CAs) serving as intelligent, context-aware mediators to achieve VI-friendly adaptive UI. Our approach focuses on facilitating real-time UI adjustments based on human-AI conversations and device and ambient context, leveraging LLM CAs' reasoning and tool-using capabilities to accommodate different types of visual impairments. A prototype was developed to demonstrate how the system can dynamically adapt the interface. Our initial observations suggest that this approach has the potential to promote a seamless and inclusive experience for VI users, particularly on daily information terminals that are not flexible or VI-friendly in terms of quickly adjusting UI settings, such as kiosks, multimedia guides, and wearable displays.

Paper link: LLM‑Driven Adaptive UI for the Visually Impaired

✨ Key Features

  • Agent Orchestration: A scheduler agent routes requests to specialized agents:
    • UI Adjuster: updates background/text colors, font sizes, layout, and cursor accessibility
    • Profile Updater: collects and stores user preferences
    • Content Explainer: reads and explains on-screen content in simple language
  • Voice In / Voice Out:
    • Speech-to-Text with OpenAI Whisper (whisper-1)
    • Text-to-Speech with OpenAI TTS (tts-1, voice alloy)
  • Accessible Controls:
    • High-contrast themes, large font presets, flexible layouts
    • Optional large focus cursor overlay
    • Keyboard shortcuts for voice interaction
  • Sample News App: Pages for Home, Local, World, and Settings to showcase in-context adaptation.

🧱 Tech Stack

  • React 18 + TypeScript, Vite
  • Redux Toolkit for state management
  • Tailwind CSS (with tailwind-scrollbar)
  • OpenAI SDK (browser) for chat + TTS + STT
  • React Router for navigation

Getting Started

Prerequisites

  • Node.js 18+ (recommended for Vite 5)
  • Yarn, npm, or pnpm
  • An OpenAI API key with access to gpt-4o, tts-1, and whisper-1

1) Clone and install

git clone <this-repo-url>
cd llm-driven-adaptive-ui-for-the-visually-impaired
yarn install

2) Configure environment

Create a .env file in the project root:

VITE_OPENAI_API_KEY=sk-...your_key...

Note: The client runs entirely in the browser using the OpenAI SDK with dangerouslyAllowBrowser: true. See Security notes below before deploying.

3) Run the app

yarn dev        # start Vite dev server

Build and preview

yarn build
yarn preview

Usage

Keyboard shortcuts

  • Space: Start/stop voice interaction (records a voice message or toggles playback)
  • V: Replay the assistant’s last audio response (when available)

When pressing Space, any currently selected text on the page is captured to help the Content Explainer agent provide context-aware explanations.

Pages

  • Home: Sample news listing with images and text regions that the agent can describe
  • Local and World: Additional sections used to demonstrate navigation and context
  • Settings: Where user profile preferences are summarized/managed by the agent

Agent capabilities (high level)

  • Scheduler agent decides which specialized agent to activate
  • UI Adjuster agent can call functions to:
    • Change background/text color presets (including high-contrast options)
    • Increase/decrease font size presets
    • Swap layout (main content vs. sidebar positioning)
    • Toggle an accessible large cursor overlay
  • Profile Updater agent edits: username, interface preference, content preference, scheduling preference
  • Content Explainer agent reads and explains content using simplified language

Models used

  • Chat: gpt-4o
  • Speech-to-Text: whisper-1
  • Text-to-Speech: tts-1 (voice alloy)

Configuration

Environment variables

  • VITE_OPENAI_API_KEY (required): OpenAI API key used by the browser client

Security

This demo runs the OpenAI SDK in the browser with dangerouslyAllowBrowser: true, which exposes your API key to clients. For production use:

  • Move API access to a server or edge function and proxy requests

About

Code repo for Kaiyuan's paper "Exploring LLM Agents as Seamless and Context-aware UI Adjusters for the Visually Impaired"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published