π¬ AI to create short vertical videos optimized for platforms such as TikTok, YouTube Shorts, and Instagram Reels π€
A fully automated, stateful pipeline that generates short-form vertical videos for language education from a single text prompt. This agent uses the Google Gemini API for all creative tasks and meticulously logs every action in a SQLite database, ensuring full traceability and recoverability.
Example 1 | Example 2 | Example 3 |
---|---|---|
final_video.mp4 |
final_video.mp4 |
final_video.mp4 |
- π End-to-End Automation: Go from a single prompt to a final
.mp4
video with one command. - π§ Intelligent Content Planning: The AI detects source/target languages, generates a script, and creates custom prompts for a perfectly themed background image and music.
- ποΈ Persistent & Auditable: Every run is a "project" with its plan and detailed log history stored in a central SQLite database.
- π Stateful & Recoverable: Automatically tracks the status of each project. If a job fails, you can resume from the exact point of failure.
- π± Social-Media Ready: Generates video titles, descriptions, and hashtags in the audience's native language.
- π Rate Limit Aware: Includes an API key rotator to gracefully handle free-tier API rate limits by switching keys automatically.
- π¨ High-Quality Video: Features improved text placement, dynamic animations, darkened backgrounds for legibility, and professional, language-specific typography.
- π§ Granular Control: Regenerate the entire video or just specific assets (like the background or music) for any project.
The agent operates as a multi-stage pipeline, where each component has a single responsibility. The entire process is orchestrated by main.py
and centrally tracked in a SQLite database.
sequenceDiagram
participant User as π€ User (CLI)
participant Main as π main.py
participant DB as ποΈ DatabaseManager
participant Planner as π§ ContentPlanner
participant Assets as π οΈ AssetGenerator
participant Composer as ποΈ VideoComposer
participant Gemini as β¨ Google Gemini API
User->>Main: python main.py --prompt "..."
Main->>DB: create_preliminary_project()
Main->>Planner: generate_plan(prompt)
Planner->>Gemini: Generate JSON plan (via gemini-pro)
Gemini-->>Planner: VideoPlan object
Planner-->>Main: Returns VideoPlan
Main->>DB: save_plan_and_finalize_project()
loop Asset Generation & Composition
Main->>DB: update_project_status("Generating Assets")
Main->>Assets: generate_core_assets(plan)
Assets->>Gemini: Generate TTS Audio & BG Image
Gemini-->>Assets: .wav & .png files
Main->>Composer: calculate_video_duration()
Composer-->>Main: final_duration
Main->>Assets: generate_music_asset(duration)
Assets->>Gemini: Generate Music (.wav)
Gemini-->>Assets: .wav file
Main->>Composer: create_video(plan, duration)
Composer-->>Main: final_video.mp4
end
Main->>DB: update_project_status("Completed")
Main-->>User: β
Success! Shows final video path & metadata.
The core logic is encapsulated within the agent/
package:
- π
main.py
(The Conductor): The main entry point. Parses command-line arguments (--prompt
,--resume
,--regenerate-*
), initializes all managers, and orchestrates the project workflow from start to finish. - ποΈ
agent/database.py
(The State Manager): Manages all interactions with theprojects.sqlite
database. It creates, retrieves, and updates project records and statuses, making the entire pipeline stateful. - π
agent/logger.py
(The Auditor): A singleton logger that provides clean, high-level console output usingrich
while simultaneously writing verbose, structured logs (including AI prompts and errors) to the database for full auditability. - π
agent/api_manager.py
(The Diplomat): Manages a pool of Google Gemini API keys from your.env
file. If one key hits a rate limit, it automatically and seamlessly switches to the next available key. - π§
agent/planner.py
(The Creative Director): Takes the initial user prompt and uses the Gemini API to generate a comprehensiveVideoPlan
. This plan is a structured JSON object containing everything from the script and word pairs to social media copy and AI prompts for other assets. - π οΈ
agent/asset_generator.py
(The Production Crew): Executes theVideoPlan
by calling the appropriate Gemini models to generate the background image, all text-to-speech audio files, and the background music track. - ποΈ
agent/composer.py
(The Editor): UsesMoviePy
to assemble all the generated image and audio assets into a final, polished.mp4
video, applying animations, text overlays, and audio mixing. - ποΈ
agent/config.py
(The Control Panel): A centralized file for all static configuration: model names, API delays, video dimensions, font paths, music volume, and more. This is the first place to look for customization.
-
Clone the repository:
git clone https://github.com/aaurelions/short-video-maker cd short-video-maker
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up Google Gemini API Keys:
- Get API keys from Google AI Studio.
- Create a
.env
file in the project root. - Add your keys, comma-separated. The system will rotate them if one hits a rate limit.
GOOGLE_API_KEYS="YOUR_API_KEY_1,YOUR_API_KEY_2"
-
Install FFmpeg and ImageMagick (for MoviePy):
- FFmpeg: MoviePy Docs on FFmpeg
- ImageMagick: ImageMagick Download Page
-
Install Fonts (Recommended for best quality): On macOS/Linux, you can clone the Google Fonts repository.
# Example for macOS cd ~/Library/Fonts/ git clone https://github.com/google/fonts.git google-fonts
Note: Font paths are configured in
agent/config.py
and may need to be adjusted for your OS.
python main.py --prompt "Create a video for English speakers to learn 5 essential Japanese words for a ramen shop"
If a project fails, you can resume it.
# Resume the very last project that failed
python main.py --resume
# Resume a specific project by name
python main.py --resume "japanese-ramen-shop-words-20231027103000"
You can regenerate assets for any existing project without starting over. This is useful for tweaking visuals, audio, or fixing a failed music track.
If you don't provide a project name, it will target the last modified project.
# Regenerate EVERYTHING for the last project
python main.py --regenerate
# Regenerate only the final video for a specific project
python main.py --regenerate-video "project-name-to-fix"
# Regenerate just the background image for the last project
python main.py --regenerate-background
# Regenerate all spoken word audio files
python main.py --regenerate-words
# Regenerate only the music track
python main.py --regenerate-music
-r
,--regenerate
-rv
,--regenerate-video
-rb
,--regenerate-background
-ri
,--regenerate-intro
-rm
,--regenerate-music
-rw
,--regenerate-words
-rw0
,--regenerate-word-0
(and other specific word indices)
The easiest way to customize the output is by editing agent/config.py
:
- Voices: Set
TTS_RANDOM_VOICE = False
and changeTTS_DEFAULT_VOICE
to use a consistent voice. - Fonts: Modify the
FONT_MAPPINGS
dictionary to change fonts for different languages or scenes. You'll need to provide the correct path to the.ttf
file on your system. - Timings & Style: Adjust values like
CHALLENGE_DURATION_S
,MUSIC_VOLUME
, orBACKGROUND_DARKEN_OPACITY
to change the pacing and look of the video.
A successful run will produce a clear summary in your terminal and a neatly organized project folder in output/
.
Terminal Summary:
β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨
β
SUCCESS Project 'japanese-ramen-shop-words-20231027103000' completed successfully!
π₯ Final video archived in: output/japanese-ramen-shop-words-20231027103000/
--------------------
β
Title (English): 5 Essential Japanese Words for the Ramen Shop!
β
Description (English): This video will teach you 5 key Japanese words you need to know when visiting a ramen shop. Perfect for your next trip to Japan!
β
Hashtags: #LearnJapanese #JapaneseLesson #RamenShop #JapanTravel #ζ₯ζ¬θͺεεΌ· #γ©γΌγ‘γ³
β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨β¨
Project Directory:
The output/
directory contains everything:
output/
βββ japanese-ramen-shop-words-20231027103000/
β βββ background.png
β βββ intro_audio.wav
β βββ word_0.wav
β βββ word_1.wav
β βββ ...
β βββ music.wav
β βββ final_video.mp4
βββ projects.sqlite <-- The central database for ALL projects