A powerful Python tool for extracting video information and transcripts from YouTube videos, playlists, channels, and channel playlists. Built on top of yt-dlp
and youtube-transcript-api
.
- πΉ Single Video Processing - Extract metadata and transcripts from individual YouTube videos
- π Playlist Support - Process entire playlists with progress tracking
- πΊ Channel Videos - Download information from all videos on a channel
- ποΈ Channel Playlists - Process all playlists from a channel
- π Resume Capability - Automatically skip already processed videos
- π― Auto-Detection - Automatically detect URL type (video/playlist/channel)
- π Rich Metadata - Extract title, description, upload date, duration, view count, and more
- π Transcript Extraction - Get full video transcripts when available
- πΎ CSV Export - Save all data in easily accessible CSV format
pip install yt-dlp-transcripts
# As a command-line tool (after pip install)
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv
git clone https://github.com/yourusername/yt-dlp-transcripts.git
cd yt-dlp-transcripts
poetry install
poetry shell
# With poetry (after poetry install and poetry shell)
python -m yt_dlp_transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLAYLIST_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/videos" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/playlists" -o output.csv
Option | Short | Description | Example |
---|---|---|---|
--url |
-u |
YouTube URL (auto-detects type) | https://youtube.com/... |
--output |
-o |
Output CSV file path | output.csv |
--help |
Show help message | (flag, no value) |
The tool exports data to CSV with the following fields:
video_id
- YouTube video IDtitle
- Video titleurl
- Video URLdescription
- Video descriptiontranscript
- Full video transcript (when available)upload_date
- Upload date (YYYYMMDD format)duration
- Video duration in secondsview_count
- Number of viewschannel
- Channel namechannel_id
- Channel ID
playlist_name
- Name of the source playlistplaylist_url
- URL of the source playlist
channel_source_url
- URL of the channel page
# Analyze a conference talk playlist
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLconf2024" -o conference_talks.csv
# Extract all videos from an educational channel
yt-dlp-transcripts -u "https://www.youtube.com/@3blue1brown/videos" -o math_videos.csv
# Get transcripts from your competitor's channel
yt-dlp-transcripts -u "https://www.youtube.com/@competitor/videos" -o competitor_analysis.csv
# Archive your own channel's content
yt-dlp-transcripts -u "https://www.youtube.com/@yourchannel/videos" -o my_backup.csv
# Collect lecture series for analysis
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLlecture" -o lectures.csv
# Get transcripts from multiple related playlists
yt-dlp-transcripts -u "https://www.youtube.com/@university/playlists" -o all_courses.csv
from yt_dlp_transcripts import (
get_video_info,
process_single_video,
process_playlist,
process_channel,
detect_url_type
)
# Get video information as dictionary
video_data = get_video_info("https://www.youtube.com/watch?v=VIDEO_ID")
print(video_data['title'])
print(video_data['transcript'])
print(video_data['duration'])
# Process content and save to CSV
process_single_video("https://www.youtube.com/watch?v=VIDEO_ID", "output.csv")
process_playlist("https://www.youtube.com/playlist?list=PLAYLIST_ID", "output.csv")
process_channel("https://www.youtube.com/@channel/videos", "output.csv", mode='videos')
# Auto-detect URL type
url_type = detect_url_type("https://www.youtube.com/watch?v=VIDEO_ID") # Returns: 'video'
The tool automatically tracks processed videos and skips them on subsequent runs. This allows you to:
- Interrupt and resume large downloads
- Update your dataset with only new videos
- Avoid redundant API calls
When processing multiple videos, the tool shows:
- Current video number and total count
- Video title being processed
- Success/skip status for each video
- Gracefully handles missing transcripts
- Continues processing even if individual videos fail
- Provides clear error messages for troubleshooting
The tool respects YouTube's rate limits. If you encounter 429 errors:
- The tool will continue processing and get available metadata
- Transcripts may be unavailable during rate limiting
- Consider adding delays or processing in smaller batches
- Python 3.9+
- yt-dlp
- youtube-transcript-api
- click
- Transcript Availability: Not all videos have transcripts available
- Rate Limiting: YouTube may rate limit requests with large datasets
- Private Videos: Cannot access private or age-restricted content without authentication
- API Changes: YouTube's API may change, affecting functionality
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
pytest
This project is licensed under the MIT License - see the LICENSE file for details.
- Built on top of yt-dlp
- Transcript extraction via youtube-transcript-api
- CLI interface powered by click
If you encounter any issues or have questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Provide detailed error messages and URLs (when possible) for debugging
- Initial release
- Support for videos, playlists, channels, and channel playlists
- Auto-detection of URL types
- Resume capability for interrupted downloads
- CSV export with comprehensive metadata