This project processes YouTube videos by extracting audio, performing noise reduction, and identifying distinct speakers using diarization techniques. The processed audio is segmented and organized for further analysis.
- YouTube Video Processing: Accepts a YouTube link, downloads the video, and extracts the audio.
- Audio Standardization: Converts the audio to WAV format, sets a mono channel, and normalizes the sample rate.
- Noise Reduction: Applies denoising techniques to improve audio quality.
- Speaker Diarization: Identifies individual speakers and generates timestamped labels.
- Visualization: Displays speaker transitions and overlaps graphically.
- Audio Segmentation: Splits the audio into 10-second speaker-specific segments.
Ensure you have Python 3.10.12 installed.
- Clone this repository:
git clone https://github.com/myselfbasil/speaker-diarization cd speaker-diarization
- Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install required dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter notebook main.ipynb
Provide a YouTube link and execute the notebook cells step by step to process the audio.
-
Usage of the Python script:
Basic Usagepython diarization.py "https://youtube.com/watch?v=..."
Advanced Usage
python diarization.py "https://youtube.com/watch?v=..." \ -n 3 \ -o ./results \ --window 0.4 \ --period 0.2 \ --workers 8 \ --debug
- Podcast and video transcription
- Speaker analysis in discussions and interviews
- Content segmentation for research and media archiving