This Speaker Identification System identifies speakers by analyzing their voice embeddings. It registers speakers by recording their voices, computing embeddings with pyannote/embedding, and saving them in a YAML file. For identification, it compares new recordings against saved embeddings using cosine similarity.
-
Speaker Registration
- Record and compute embeddings using
pyannote/embedding. - Save embeddings in a YAML file.
- Record and compute embeddings using
-
Speaker Identification
- Compute embeddings for new recordings.
- Compare with saved embeddings and return closest matches.
Use the pre-trained pyannote/embedding model from Hugging Face.
- Offline Use: Download and configure the model locally for faster processing.
- Online Use: Obtain a Hugging Face token to access the model via the cloud.
-
Clone the Repository
git clone https://github.com/Chiraz32/Speaker-Identification.git cd speaker-identification -
Install Dependencies
pip install -r requirements.txt
-
Configure the YAML File
- Ensure
speakers_embeddings.yamlis present in the root directory.
- Ensure
-
Start the Model
python embedding_model/ymodel_api.py
-
Start the Main Application
python main/main.py
The project includes Docker files to build separate containers for both models and main code:
- Embedding Model Container: Contains the
model_api.pyfor embedding calculations. - Main Application Container: Contains
main.pyfor registration and identification.
To build and run the Docker containers:
docker build -t embedding-model -f embedding_model/Dockerfile .
docker build -t main-app -f main/Dockerfile .
docker run -p 8000:8000 embedding-model
docker run -p 5000:8080 main-app.
├── data
│ └── speakers_embeddings.yaml
├── embedding_model
│ ├── model_api.py
│ ├── Dockerfile
│ └── requirements.txt
├── main
│ ├── main.py
│ ├── Dockerfile
│ └── requirements.txt
└── README.md
- Support multiple recordings per speaker for better accuracy.
- Implement a GUI for easier speaker registration and identification.
- Expand to handle noisy environments and multiple languages.
- Improve frontend to specify identification thresholds and compare distances.
Pros: Requires less training data than deep learning models and provides clear results.
Cons: Can be slow with a large number of speakers due to embedding comparisons.

