AI Agent Framework

This repository provides a general framework for an AI agent that combines audio and video emotion detection, speech recognition, text-based emotion analysis, and language model interaction. It serves as a starting point for applications in real-time emotion-aware systems, conversational AI, or personalized user experiences.

Our framework for a multimodal AI agent performs:

Real-time video emotion detection (using DeepFace).
Audio speech-to-text processing and audio emotion recognition.
Text emotion detection for user-provided speech.
LLM-based responses via Groq.
Text-to-speech output with pyttsx3.

This system monitors webcam video for facial emotions, simultaneously listens for user speech and classifies its emotion, and also processes the user’s spoken text for emotional content. Finally, it provides a short LLM-based response incorporating the recognized emotional states, then speaks that response out loud.

FEATURES

VIDEO EMOTION RECOGNITION
    Uses DeepFace to analyze frames from a live video feed (or webcam) and detects the dominant emotion (e.g., happy, sad, neutral).
    Keeps track of a window of recent emotions for more stable results.

AUDIO SPEECH & EMOTION DETECTION
    Listens in real time using the microphone via SpeechRecognition.
    Transcribes speech with a Wav2Vec2 model (facebook/wav2vec2-base-960h).
    Classifies the audio emotion (e.g., neutral, happy, angry) using a pretrained emotion recognition model (ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition).

TEXT EMOTION RECOGNITION
    Processes the transcribed text using a huggingface pipeline to classify the text’s emotional tone (e.g., sadness, joy, fear).

LLM INTEGRATION VIA GROQ
    Sends user’s transcribed text to a large language model for a short, context-appropriate response.
    Currently configured for llama3-70b-8192 (adjust as needed).

TEXT-TO-SPEECH OUTPUT
    Speaks back the LLM’s short advice or message to the user using pyttsx3.

REQUIREMENTS

deepface==0.0.92

groq==0.18.0

librosa==0.10.2

opencv_contrib_python==4.10.0.84

opencv_python==4.11.0.86

pyttsx3==2.90

SpeechRecognition==3.12.0

torch==2.4.1

transformers==4.44.0

Note: You may need additional system-level dependencies (for example, ffmpeg for librosa, or a Windows/Linux TTS engine for pyttsx3).

INSTALLATION

Clone the repository: git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name

Create a virtual environment (recommended): python -m venv venv source venv/bin/activate (Linux/macOS) OR venv\Scripts\activate (Windows)

Install Python dependencies: pip install -r requirements.txt

Or install them individually with: pip install deepface==0.0.92 groq==0.18.0 librosa==0.10.2.post1 ...

Set up Groq API Key (if you plan to use the LLM): export GROQ_API_KEY="your_groq_api_key_here" (Adjust for your OS.)

USAGE

Run the main script: python AppProcess.py This will:
    Launch AppVideo_file.py for webcam-based emotion detection.
    Launch AppAudio_file.py for speech recognition and audio emotion detection.
    Keep reading resultVideo.txt and resultAudio.txt to gather the dominant emotions.

Interaction:
    The camera will analyze your facial expressions in real time.
    Speak into your microphone. The system transcribes your speech, classifies your audio/text emotions, and stores the results in resultAudio.txt.
    After each utterance, it prompts a short response from the LLM, then speaks it via TTS.

Exiting:
    Press Ctrl + C in the terminal to stop.
    Or say "finish recording" to signal a graceful finish (depending on code implementation).

CONFIGURATION

Groq Model: In main.py, look for the get_lmm_response function. You can change the model name in client.chat.completions.create(...).

Text-to-Speech Voice: Near the pyttsx3.init() call, specify which voice you want by adjusting the loop that checks for voice.id.

Thresholds and Timings: In AppAudio_file.py (or wherever your main code is), adjust recognizer.energy_threshold or recognizer.pause_threshold to suit your environment and speaking style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Agent Framework

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
AppAudio_file.py		AppAudio_file.py
AppProcess.py		AppProcess.py
AppVideo_file.py		AppVideo_file.py
README.md		README.md
requirements.txt		requirements.txt
resultAudio.txt		resultAudio.txt
resultVideo.txt		resultVideo.txt

bernardo-gatto/AI-Agent-Framework

Folders and files

Latest commit

History

Repository files navigation

AI Agent Framework

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages