This tool is designed to provide a quick and concise summary of audio and video files. It supports summarizing content either from a local file or directly from YouTube. The tool uses Whisper for transcription and a local version of Llama3 (via Ollama) for generating summaries.
Tip
It is possible to change the model you wish to use.
To do this, change the OLLAMA_MODEL
variable, and download the associated model via ollama
- YouTube Integration: Download and summarize content directly from YouTube.
- Local File Support: Summarize audio/video files available on your local disk.
- Transcription: Converts audio content to text using Whisper.
- Summarization: Generates a concise summary using Llama3 (Ollama).
- Transcript Only Option: Option to only transcribe the audio content without generating a summary.
- Device Optimization: Automatically uses the best available hardware (MPS for Mac, CUDA for NVIDIA GPUs, or CPU).
Before you start using this tool, you need to install the following dependencies:
- Python 3.12 and lower than 3.13
- Ollama for LLM model management
ffmpeg
(required for audio processing)- uv for package management
Clone the repository and install the required Python packages using uv:
git clone https://github.com/damienarnodo/audio-summary-with-local-LLM.git
cd audio-summary-with-local-LLM
# Create and activate a virtual environment with uv
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Download and install Ollama to carry out LLM Management. More details about LLM models supported can be found on the Ollama GitHub.
Download and use the Llama3 model:
ollama pull llama3
## Test the access:
ollama run llama3 "tell me a joke"
The tool can be executed with the following command line options:
--from-youtube
: To download and summarize a video from YouTube.--from-local
: To load and summarize an audio or video file from the local disk.--output
: Specify the output file path (default: ./summary.md)--transcript-only
: To only transcribe the audio content without generating a summary.--language
: Select the language to be used for the transcription (default: en)
-
Summarizing a YouTube video:
uv run python src/summary.py --from-youtube <YouTube-Video-URL>
-
Summarizing a local audio file:
uv run python src/summary.py --from-local <path-to-audio-file>
-
Transcribing a YouTube video without summarizing:
uv run python src/summary.py --from-youtube <YouTube-Video-URL> --transcript-only
-
Transcribing a local audio file without summarizing:
uv run python src/summary.py --from-local <path-to-audio-file> --transcript-only
-
Specifying a custom output file:
uv run python src/summary.py --from-youtube <YouTube-Video-URL> --output my_summary.md
The output summary will be saved in a markdown file in the specified output directory, while the transcript will be saved in the temporary directory.
The summarized content is saved as a markdown file (default: summary.md
) in the current working directory. This file includes a title and a concise summary of the content. The transcript is saved in the tmp/transcript.txt
file.
The tool automatically detects and uses the best available hardware:
- MPS (Metal Performance Shaders) for Apple Silicon Macs
- CUDA for NVIDIA GPUs
- Falls back to CPU when neither is available
This tool can process audio files of any length. For files longer than 30 seconds, the script automatically:
- Chunks the audio into manageable segments
- Processes each chunk separately
- Combines the results into a single transcript
- YouTube Video Summarizer with OpenAI Whisper and GPT
- Ollama GitHub Repository
- Transformers by Hugging Face
- yt-dlp Documentation
If you encounter this error::
yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
Please refer to this post
If you encounter this error:
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted.
Try converting your file with ffmpeg:
ffmpeg -i my_file.mp4 -movflags faststart my_file_fixed.mp4
If you're running on CPU and encounter memory issues during transcription, consider:
- Using a smaller Whisper model
- Processing shorter audio segments
- Ensuring you have sufficient RAM available
Transcription can be slow on CPU. For best performance:
- Use a machine with GPU or Apple Silicon (MPS)
- Keep audio files under 10 minutes when possible
- Close other resource-intensive applications
You can easily change the models used for transcription and summarization by modifying the variables at the top of the script:
# Default models
OLLAMA_MODEL = "llama3"
WHISPER_MODEL = "openai/whisper-large-v2"
To use a different Whisper model for transcription:
-
Update the
WHISPER_MODEL
variable with one of these options:"openai/whisper-tiny"
(fastest, least accurate)"openai/whisper-base"
(faster, less accurate)"openai/whisper-small"
(balanced)"openai/whisper-medium"
(slower, more accurate)"openai/whisper-large-v2"
(slowest, most accurate)
-
Example:
WHISPER_MODEL = "openai/whisper-medium" # A good balance between speed and accuracy
For CPU-only systems, using a smaller model like whisper-base
is recommended for better performance.
To use a different model for summarization:
-
First, pull the desired model with Ollama:
ollama pull mistral # or any other supported model
-
Then update the
OLLAMA_MODEL
variable:OLLAMA_MODEL = "mistral" # or any other model you've pulled
-
Popular alternatives include:
"llama3"
(default)"mistral"
"llama2"
"gemma:7b"
"phi"
For a complete list of available models, visit the Ollama model library.