Skip to content
forked from mlim70/NavAI

Snapchat Spectacles AR app. 1st place winner of Immerse GT 2025's "Best Use of Gemini" MLH award

License

Notifications You must be signed in to change notification settings

devinzhang415/NavAI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NavAI - Reimagining Navigation Through AR

MLH Winner Gemini AI Spectacles TypeScript Snap AR

🏆 Winner of "Best Use of Gemini AI" at ImmerseGT 2024 - Georgia Tech's Premier VR/XR Hackathon

An AR-powered navigation platform that transforms urban exploration through natural voice commands, dynamic AR mapping, and AI-generated insights. Using Snap Spectacles and Gemini AI, NavAI delivers a hands-free, immersive wayfinding experience for modern explorers.

AR Navigation Demo


Table of Contents


Overview

NavAI reimagines navigation by integrating immersive AR with AI-driven natural language understanding and image analysis. With voice-activated queries and real-time AR map overlays, users can effortlessly discover points of interest and obtain contextual information—all without ever pulling out their phone.


Tech Stack

Hardware

Snap Spectacles — AR-enabled smart glasses
Built-in Camera & Mic — For seamless image capture and voice input
Location Sensors — Precise geolocation for dynamic mapping

Core Technologies

TypeScript — Strongly-typed development environment
Snap AR SDK — AR content rendering and spatial tracking
SIK (Spectacles Interaction Kit) — Gesture recognition and interaction

AI & APIs

Google Gemini AI — Natural language processing and visual analysis
Google Places API — Dynamic retrieval of nearby points of interest
Custom Flask Backend — Image processing and AI integration


Directory Structure

/Outdoor Navigation
├── Assets/                      # Main project assets and code
│   ├── VoiceController.ts      # Voice command processing
│   ├── SpeechToText.ts         # Speech recognition integration
│   ├── Scripts/                # Core TypeScript/JavaScript files
│   │   ├── geminiHandler.ts    # Gemini AI integration
│   │   ├── PlacesWrapper.ts    # Places API wrapper
│   │   ├── MapUIController.ts  # Map interface controls
│   │   ├── MapManipulation.ts  # Map interaction handling
│   │   ├── MapConstants.ts     # Configuration constants
│   │   ├── NorthIndicator.ts   # Compass functionality
│   │   ├── PinchController.ts  # Gesture recognition
│   │   ├── PopupController.ts  # Information display
│   │   ├── QuestMarkController.ts  # Navigation markers
│   │   ├── UICollisionDetector.ts  # UI interaction handling
│   │   ├── MapMessageController.ts  # Map messaging system
│   │   ├── MapContainerController.ts # Map container management
│   │   └── Logging.ts          # Logging utilities
│   ├── MapComponent/           # Map visualization system
│   │   ├── Scripts/           # Map-specific scripts
│   │   ├── Prefabs/           # Reusable map elements
│   │   ├── Materials/         # Map styling assets
│   │   └── Textures/          # Map visual resources
│   ├── SpectaclesInteractionKit/  # SIK integration
│   ├── Device Camera Texture   # Camera input handling
│   ├── VoiceInput             # Voice input configuration
│   └── SoundOutput            # Audio output handling
├── Support/                    # Project support files
├── Workspaces/                # Lens Studio workspace configs
├── README-ref/                # Documentation assets
├── tsconfig.json              # TypeScript configuration
└── OutdoorNavigation.esproj   # Project configuration file

Key Components

  • Voice Processing: VoiceController.ts and SpeechToText.ts handle wake word detection ("Hey Alice") and voice commands
  • Map System: Multiple controllers (MapUIController, MapManipulation, etc.) manage the AR map experience
  • AI Integration: geminiHandler.ts coordinates between Gemini AI and the system, while PlacesWrapper.ts handles Places API interactions
  • Interaction: PinchController.ts and PopupController.ts manage user input and feedback
  • Supporting Systems: Includes camera, voice input, and sound output configurations

The project follows a modular architecture, separating concerns between voice processing, map visualization, AI integration, and user interaction. This structure enables easy maintenance and future enhancements of individual components.


Modules and Functions

Voice Interaction & Gemini AI

  • Purpose: Interprets natural language queries (e.g., "Find me a quiet coffee shop") via voice commands.
  • Functionality: Uses Gemini AI to map user intent to relevant location categories, generating structured requests for the Places API.

Dynamic AR Mapping

  • Purpose: Renders a real-time, interactive AR map overlay displaying nearby points of interest.
  • Functionality: Updates map pins dynamically as the user moves, ensuring location-aware navigation.

Snapshot Analysis

  • Purpose: Captures and processes visual data using simple gesture controls.
  • Functionality: Uses a pinch gesture to capture surroundings and leverages OCR and AI to generate rich, contextual descriptions of buildings or landmarks.

Usage

  1. Activate Voice Command: Say the trigger phrase (e.g., "Hey Alice") and ask for your desired location.
  2. Explore AR Map: Watch as dynamic location pins appear over your real-world view.
  3. Capture Insights: Use a pinch gesture to capture a scene and receive immediate AI-generated insights.

License

This project is licensed under the MIT License.

About

Snapchat Spectacles AR app. 1st place winner of Immerse GT 2025's "Best Use of Gemini" MLH award

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 96.3%
  • Python 2.1%
  • JavaScript 1.6%