NavAI - Reimagining Navigation Through AR

🏆 Winner of "Best Use of Gemini AI" at ImmerseGT 2024 - Georgia Tech's Premier VR/XR Hackathon

An AR-powered navigation platform that transforms urban exploration through natural voice commands, dynamic AR mapping, and AI-generated insights. Using Snap Spectacles and Gemini AI, NavAI delivers a hands-free, immersive wayfinding experience for modern explorers.

Overview

NavAI reimagines navigation by integrating immersive AR with AI-driven natural language understanding and image analysis. With voice-activated queries and real-time AR map overlays, users can effortlessly discover points of interest and obtain contextual information—all without ever pulling out their phone.

Tech Stack

Hardware

Snap Spectacles — AR-enabled smart glasses
Built-in Camera & Mic — For seamless image capture and voice input
Location Sensors — Precise geolocation for dynamic mapping

Core Technologies

TypeScript — Strongly-typed development environment
Snap AR SDK — AR content rendering and spatial tracking
SIK (Spectacles Interaction Kit) — Gesture recognition and interaction

AI & APIs

Google Gemini AI — Natural language processing and visual analysis
Google Places API — Dynamic retrieval of nearby points of interest
Custom Flask Backend — Image processing and AI integration

Directory Structure

/Outdoor Navigation
├── Assets/                      # Main project assets and code
│   ├── VoiceController.ts      # Voice command processing
│   ├── SpeechToText.ts         # Speech recognition integration
│   ├── Scripts/                # Core TypeScript/JavaScript files
│   │   ├── geminiHandler.ts    # Gemini AI integration
│   │   ├── PlacesWrapper.ts    # Places API wrapper
│   │   ├── MapUIController.ts  # Map interface controls
│   │   ├── MapManipulation.ts  # Map interaction handling
│   │   ├── MapConstants.ts     # Configuration constants
│   │   ├── NorthIndicator.ts   # Compass functionality
│   │   ├── PinchController.ts  # Gesture recognition
│   │   ├── PopupController.ts  # Information display
│   │   ├── QuestMarkController.ts  # Navigation markers
│   │   ├── UICollisionDetector.ts  # UI interaction handling
│   │   ├── MapMessageController.ts  # Map messaging system
│   │   ├── MapContainerController.ts # Map container management
│   │   └── Logging.ts          # Logging utilities
│   ├── MapComponent/           # Map visualization system
│   │   ├── Scripts/           # Map-specific scripts
│   │   ├── Prefabs/           # Reusable map elements
│   │   ├── Materials/         # Map styling assets
│   │   └── Textures/          # Map visual resources
│   ├── SpectaclesInteractionKit/  # SIK integration
│   ├── Device Camera Texture   # Camera input handling
│   ├── VoiceInput             # Voice input configuration
│   └── SoundOutput            # Audio output handling
├── Support/                    # Project support files
├── Workspaces/                # Lens Studio workspace configs
├── README-ref/                # Documentation assets
├── tsconfig.json              # TypeScript configuration
└── OutdoorNavigation.esproj   # Project configuration file

Key Components

Voice Processing: VoiceController.ts and SpeechToText.ts handle wake word detection ("Hey Alice") and voice commands
Map System: Multiple controllers (MapUIController, MapManipulation, etc.) manage the AR map experience
AI Integration: geminiHandler.ts coordinates between Gemini AI and the system, while PlacesWrapper.ts handles Places API interactions
Interaction: PinchController.ts and PopupController.ts manage user input and feedback
Supporting Systems: Includes camera, voice input, and sound output configurations

The project follows a modular architecture, separating concerns between voice processing, map visualization, AI integration, and user interaction. This structure enables easy maintenance and future enhancements of individual components.

Modules and Functions

Voice Interaction & Gemini AI

Purpose: Interprets natural language queries (e.g., "Find me a quiet coffee shop") via voice commands.
Functionality: Uses Gemini AI to map user intent to relevant location categories, generating structured requests for the Places API.

Dynamic AR Mapping

Purpose: Renders a real-time, interactive AR map overlay displaying nearby points of interest.
Functionality: Updates map pins dynamically as the user moves, ensuring location-aware navigation.

Snapshot Analysis

Purpose: Captures and processes visual data using simple gesture controls.
Functionality: Uses a pinch gesture to capture surroundings and leverages OCR and AI to generate rich, contextual descriptions of buildings or landmarks.

Usage

Activate Voice Command: Say the trigger phrase (e.g., "Hey Alice") and ask for your desired location.
Explore AR Map: Watch as dynamic location pins appear over your real-world view.
Capture Insights: Use a pinch gesture to capture a scene and receive immediate AI-generated insights.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
api		api
src		src
.gitignore		.gitignore
README.md		README.md
license.txt		license.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NavAI - Reimagining Navigation Through AR

Table of Contents

Overview

Tech Stack

Hardware

Core Technologies

AI & APIs

Directory Structure

Key Components

Modules and Functions

Voice Interaction & Gemini AI

Dynamic AR Mapping

Snapshot Analysis

Usage

License

About

Uh oh!

Releases

Packages

Languages

License

devinzhang415/NavAI

Folders and files

Latest commit

History

Repository files navigation

NavAI - Reimagining Navigation Through AR

Table of Contents

Overview

Tech Stack

Hardware

Core Technologies

AI & APIs

Directory Structure

Key Components

Modules and Functions

Voice Interaction & Gemini AI

Dynamic AR Mapping

Snapshot Analysis

Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages