Version: 1.0.2
Author: Rakib Hasan
Category: Audio Data Processing
VADCoreESP32
is a library for Voice Activity Detection (VAD) on ESP32 devices. It utilizes floating-point Fast Fourier Transform (FFT) calculations to detect speech activity in real-time. This library is optimized for the ESP32 hardware, providing efficient audio processing and VAD functionality.
- Real-time Voice Activity Detection using FFT
- Configurable core and priority for FreeRTOS tasks
- Gain adjustment and noise thresholding
- Flexible I2S configuration for audio input
- Smooth energy calculation for improved detection accuracy
-
Download the Library
You can download the latest version of
VADCoreESP32
from the GitHub repository. -
Add to Your Arduino Libraries
- Copy the
VADCoreESP32
folder into your Arduinolibraries
directory. - Restart the Arduino IDE if it was open during the copy.
- Copy the
#include <VADCoreESP32.h>
// Define your I2S pin configuration
#define I2S_PORT I2S_NUM_0
#define I2S_WS 16
#define I2S_SD 7
#define I2S_SCK 15
VADCoreESP32 myVad;
void setup() {
Serial.begin(115200);
myVad.i2sInit(I2S_PORT, I2S_SCK, I2S_WS, I2S_SD);
myVad.setCore(0); // Set to Core 0 (Optional, if your board has 2 cores)
myVad.setPriority(10); // Set task priority (Optional, if your board supports different priorities)
}
void loop() {
if (Serial.available() > 0) {
String cmd = Serial.readString();
if (cmd.compareTo("start") == 0) {
myVad.start(); // Start VAD task
Serial.println("Start Listening Command Received");
}
}
if (myVad.getState() == VAD_SILENCE) {
Serial.println("[Idle Heap:" + String(ESP.getFreeHeap()) + " Core:" + String(xPortGetCoreID()) + "]");
}
delay(100);
}
-
void setCore(int coreId)
Sets the core for the VAD task. Accepts
0
or1
for ESP32 cores. -
void setPriority(UBaseType_t priority)
Sets the priority of the VAD task. Use a value between
0
(lowest) and24
(highest). -
bool getState()
Returns the current state of the VAD. Returns
VAD_VOICE
if speech is detected, otherwiseVAD_SILENCE
. WhenVAD_SILENCE
triggers VAD Task Gets Automatically got Deleted. -
void start()
Initializes and starts the VAD task with the configured core and priority.
I2S_SAMPLE_RATE
: The sample rate for I2S audio input. Default is16000
.FFT_SIZE
: The size of the FFT. Default is256
.SPEECH_THRESHOLD
: The threshold for detecting speech activity. Default is3000
.NOISE_THRESHOLD
: The threshold for distinguishing noise. Default is1000
.SPEECH_FREQ_MIN
andSPEECH_FREQ_MAX
: The frequency range for detecting speech. Defaults are300
Hz and3400
Hz, respectively.GAIN_FACTOR
: The gain factor for audio samples. Default is1.5
.
This library is released under the MIT License. See the LICENSE file for details.https://chatgpt.com/share/66e75389-7340-8004-a16e-7e6a9af3ecbf