Building an AI Assistant with ESP32-S3

We'll guide you through creating an AI assistant using the ESP32-S3, a microphone, and a speaker. This project will capture audio, process it with AI, and respond with synthesized speech.

Demo

Video Demonstration

Repo:

https://github.com/lout33/ai_assistant_esp32

Materials Needed

ESP32-S3 Devkit
INMP441 Microphone
MAX98357A Amplifier
0.5W Speaker
Breadboard and Jumper Wires

Circuit Setup

Connect the Microphone (INMP441):
- VDD to 3.3V
- GND to GND
- SCK to GPIO 5
- WS to GPIO 6
- SD to GPIO 17
Connect the Amplifier (MAX98357A):
- VIN to 3.3V
- GND to GND
- BCLK to GPIO 5
- LRC to GPIO 6
- DIN to GPIO 7
Connect the Speaker:
- Connect the speaker to the output terminals of the MAX98357A.
Connect the Button:
- Connect one terminal to GPIO 35 and the other to GND.
Connect the RGB LED:
- Connect to GPIO 48.

Software Setup

Arduino Code

Install Required Libraries:
- Install ArduinoWebsockets and Adafruit_NeoPixel libraries via the Arduino Library Manager.
Upload the Code:
- Use the provided code.ino file.
- Update the WiFi credentials and WebSocket server details.

Node.js Server

Install Node.js and Required Packages:
```
npm install ws wav openai fluent-ffmpeg
```
Run the Server:
- Use the server.js file.
- Replace the OpenAI API key with your own.

Running the Project

Start the Node.js Server:
```
node server.js
```
Power the ESP32-S3:
- Connect it to your computer or a power source.
Interact with the AI Assistant:
- Press the button to start recording.
- Speak into the microphone.
- The AI will process your speech and respond through the speaker.

Explanation

ESP32-S3 captures audio and sends it to the server via WebSocket.
Node.js Server processes the audio using OpenAI's API and sends back a response.
ESP32-S3 plays the response through the speaker.

Troubleshooting

Ensure all connections are secure.
Check WiFi credentials and server IP.
Verify the OpenAI API key is correct.

This project demonstrates a simple AI assistant using readily available components. Enjoy experimenting and expanding its capabilities!

System Architecture and Data Flow

graph TD
    %% Main Hardware Components
    subgraph Hardware [ESP32-S3 Hardware Components]
        Button[Push Button]
        Mic[INMP441 Microphone]
        Speaker[MAX98357A Speaker]
        LED[RGB Status LED]
    end

    %% ESP32 Processing
    subgraph ESP32 [ESP32-S3 Processing]
        AudioIn[Audio Input Handler]
        AudioOut[Audio Output Handler]
        WSClient[WebSocket Client]
        
        %% LED States
        style LED_States fill:#f9f9f9,stroke:#666
        subgraph LED_States [LED Status]
            Recording[🔴 Recording]
            Playing[🟢 Playing]
            Idle[🔵 Standby]
        end
    end

    %% Server Processing
    subgraph Server [AI Processing Server]
        WSServer[WebSocket Server]
        
        subgraph AI [OpenAI Services]
            Whisper[Speech-to-Text]
            GPT[GPT-4 Processing]
            TTS[Text-to-Speech]
        end
    end

    %% Data Flow Connections
    Button -->|Press & Hold| AudioIn
    Mic -->|I2S Protocol| AudioIn
    AudioIn -->|Raw Audio| WSClient
    WSClient -->|WebSocket| WSServer
    
    WSServer -->|Audio Stream| Whisper
    Whisper -->|Text| GPT
    GPT -->|Response| TTS
    TTS -->|Audio| WSServer
    
    WSServer -->|Audio Response| WSClient
    WSClient -->|I2S Protocol| AudioOut
    AudioOut --> Speaker

    %% LED Status Flow
    AudioIn -->|Triggers| Recording
    AudioOut -->|Triggers| Playing
    WSClient -->|Triggers| Idle

System States

Standby Mode 🔵
- System idle and ready
- WebSocket connection active
- LED shows blue
Recording Mode 🔴
- Button pressed
- Capturing audio via INMP441
- Streaming to server
- LED shows red
Response Mode 🟢
- Receiving AI response
- Playing audio via MAX98357A
- LED shows green

Data Processing Flow

Audio Capture
- Button trigger → INMP441 microphone
- I2S protocol handling
- Raw audio buffering
AI Processing
- Speech-to-Text (Whisper)
- Language Processing (GPT-4)
- Text-to-Speech Synthesis
Audio Playback
- WebSocket streaming
- I2S protocol output
- Speaker amplification

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code.ino		code.ino
readme.md		readme.md
reference.png		reference.png
reference_2.jpg		reference_2.jpg
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building an AI Assistant with ESP32-S3

Demo

Video Demonstration

Repo:

Materials Needed

Circuit Setup

Software Setup

Arduino Code

Node.js Server

Running the Project

Explanation

Troubleshooting

System Architecture and Data Flow

System States

Data Processing Flow

About

Uh oh!

Releases

Packages

Languages

lout33/ai_assistant_esp32

Folders and files

Latest commit

History

Repository files navigation

Building an AI Assistant with ESP32-S3

Demo

Video Demonstration

Repo:

Materials Needed

Circuit Setup

Software Setup

Arduino Code

Node.js Server

Running the Project

Explanation

Troubleshooting

System Architecture and Data Flow

System States

Data Processing Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages