Speaker Identification System

Overview

This Speaker Identification System identifies speakers by analyzing their voice embeddings. It registers speakers by recording their voices, computing embeddings with pyannote/embedding, and saving them in a YAML file. For identification, it compares new recordings against saved embeddings using cosine similarity.

Features

Speaker Registration
- Record and compute embeddings using pyannote/embedding.
- Save embeddings in a YAML file.
Speaker Identification
- Compute embeddings for new recordings.
- Compare with saved embeddings and return closest matches.

Embedding Model

Use the pre-trained pyannote/embedding model from Hugging Face.

Offline Use: Download and configure the model locally for faster processing.
Online Use: Obtain a Hugging Face token to access the model via the cloud.

Installation

Clone the Repository

git clone https://github.com/Chiraz32/Speaker-Identification.git
cd speaker-identification

Install Dependencies
```
pip install -r requirements.txt
```
Configure the YAML File
- Ensure speakers_embeddings.yaml is present in the root directory.
Start the Model
```
python embedding_model/ymodel_api.py
```
Start the Main Application
```
python main/main.py
```

Docker

The project includes Docker files to build separate containers for both models and main code:

Embedding Model Container: Contains the model_api.py for embedding calculations.
Main Application Container: Contains main.py for registration and identification.

To build and run the Docker containers:

docker build -t embedding-model -f embedding_model/Dockerfile .
docker build -t main-app -f main/Dockerfile .
docker run -p 8000:8000 embedding-model
docker run -p 5000:8080 main-app

Usage

Register a Speaker

Endpoint: /register_speaker
Method: POST
Form Data:
- audio: .flac file
- speaker: Unique ID

Identify a Speaker

Endpoint:/identify_speaker
Method: POST
Form Data:
- audio: .flac file

File Structure

.
├── data
│   └── speakers_embeddings.yaml
├── embedding_model
│   ├── model_api.py
│   ├── Dockerfile
│   └── requirements.txt
├── main
│   ├── main.py
│   ├── Dockerfile
│   └── requirements.txt
└── README.md

Future Enhancements

Support multiple recordings per speaker for better accuracy.
Implement a GUI for easier speaker registration and identification.
Expand to handle noisy environments and multiple languages.
Improve frontend to specify identification thresholds and compare distances.

Pros: Requires less training data than deep learning models and provides clear results.
Cons: Can be slow with a large number of speakers due to embedding comparisons.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
data		data
embedding_model		embedding_model
main		main
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speaker Identification System

Overview

Features

Embedding Model

Installation

Docker

Usage

Register a Speaker

Identify a Speaker

File Structure

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Chiraz32/Speaker-Identification

Folders and files

Latest commit

History

Repository files navigation

Speaker Identification System

Overview

Features

Embedding Model

Installation

Docker

Usage

Register a Speaker

Identify a Speaker

File Structure

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages