Skip to content

AI to create short vertical videos optimized for platforms such as TikTok, YouTube Shorts, and Instagram Reels.

Notifications You must be signed in to change notification settings

aaurelions/short-video-maker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 AI to create short vertical videos optimized for platforms such as TikTok, YouTube Shorts, and Instagram Reels πŸ€–

A fully automated, stateful pipeline that generates short-form vertical videos for language education from a single text prompt. This agent uses the Google Gemini API for all creative tasks and meticulously logs every action in a SQLite database, ensuring full traceability and recoverability.

Example 1 Example 2 Example 3
final_video.mp4
final_video.mp4
final_video.mp4

✨ Key Features

  • πŸš€ End-to-End Automation: Go from a single prompt to a final .mp4 video with one command.
  • 🧠 Intelligent Content Planning: The AI detects source/target languages, generates a script, and creates custom prompts for a perfectly themed background image and music.
  • πŸ—„οΈ Persistent & Auditable: Every run is a "project" with its plan and detailed log history stored in a central SQLite database.
  • πŸ”„ Stateful & Recoverable: Automatically tracks the status of each project. If a job fails, you can resume from the exact point of failure.
  • πŸ“± Social-Media Ready: Generates video titles, descriptions, and hashtags in the audience's native language.
  • πŸ”‘ Rate Limit Aware: Includes an API key rotator to gracefully handle free-tier API rate limits by switching keys automatically.
  • 🎨 High-Quality Video: Features improved text placement, dynamic animations, darkened backgrounds for legibility, and professional, language-specific typography.
  • πŸ”§ Granular Control: Regenerate the entire video or just specific assets (like the background or music) for any project.

πŸ›οΈ Architecture & Workflow

The agent operates as a multi-stage pipeline, where each component has a single responsibility. The entire process is orchestrated by main.py and centrally tracked in a SQLite database.

Workflow Diagram

sequenceDiagram
    participant User as πŸ‘€ User (CLI)
    participant Main as πŸš€ main.py
    participant DB as πŸ—„οΈ DatabaseManager
    participant Planner as 🧠 ContentPlanner
    participant Assets as πŸ› οΈ AssetGenerator
    participant Composer as 🎞️ VideoComposer
    participant Gemini as ✨ Google Gemini API

    User->>Main: python main.py --prompt "..."
    Main->>DB: create_preliminary_project()
    Main->>Planner: generate_plan(prompt)
    Planner->>Gemini: Generate JSON plan (via gemini-pro)
    Gemini-->>Planner: VideoPlan object
    Planner-->>Main: Returns VideoPlan
    Main->>DB: save_plan_and_finalize_project()

    loop Asset Generation & Composition
        Main->>DB: update_project_status("Generating Assets")
        Main->>Assets: generate_core_assets(plan)
        Assets->>Gemini: Generate TTS Audio & BG Image
        Gemini-->>Assets: .wav & .png files

        Main->>Composer: calculate_video_duration()
        Composer-->>Main: final_duration

        Main->>Assets: generate_music_asset(duration)
        Assets->>Gemini: Generate Music (.wav)
        Gemini-->>Assets: .wav file

        Main->>Composer: create_video(plan, duration)
        Composer-->>Main: final_video.mp4
    end

    Main->>DB: update_project_status("Completed")
    Main-->>User: βœ… Success! Shows final video path & metadata.
Loading

Component Breakdown

The core logic is encapsulated within the agent/ package:

  • πŸš€ main.py (The Conductor): The main entry point. Parses command-line arguments (--prompt, --resume, --regenerate-*), initializes all managers, and orchestrates the project workflow from start to finish.
  • πŸ—„οΈ agent/database.py (The State Manager): Manages all interactions with the projects.sqlite database. It creates, retrieves, and updates project records and statuses, making the entire pipeline stateful.
  • πŸ“ agent/logger.py (The Auditor): A singleton logger that provides clean, high-level console output using rich while simultaneously writing verbose, structured logs (including AI prompts and errors) to the database for full auditability.
  • πŸ”‘ agent/api_manager.py (The Diplomat): Manages a pool of Google Gemini API keys from your .env file. If one key hits a rate limit, it automatically and seamlessly switches to the next available key.
  • 🧠 agent/planner.py (The Creative Director): Takes the initial user prompt and uses the Gemini API to generate a comprehensive VideoPlan. This plan is a structured JSON object containing everything from the script and word pairs to social media copy and AI prompts for other assets.
  • πŸ› οΈ agent/asset_generator.py (The Production Crew): Executes the VideoPlan by calling the appropriate Gemini models to generate the background image, all text-to-speech audio files, and the background music track.
  • 🎞️ agent/composer.py (The Editor): Uses MoviePy to assemble all the generated image and audio assets into a final, polished .mp4 video, applying animations, text overlays, and audio mixing.
  • πŸŽ›οΈ agent/config.py (The Control Panel): A centralized file for all static configuration: model names, API delays, video dimensions, font paths, music volume, and more. This is the first place to look for customization.

πŸ› οΈ Setup

  1. Clone the repository:

    git clone https://github.com/aaurelions/short-video-maker
    cd short-video-maker
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up Google Gemini API Keys:

    • Get API keys from Google AI Studio.
    • Create a .env file in the project root.
    • Add your keys, comma-separated. The system will rotate them if one hits a rate limit.
    GOOGLE_API_KEYS="YOUR_API_KEY_1,YOUR_API_KEY_2"
  5. Install FFmpeg and ImageMagick (for MoviePy):

  6. Install Fonts (Recommended for best quality): On macOS/Linux, you can clone the Google Fonts repository.

    # Example for macOS
    cd ~/Library/Fonts/
    git clone https://github.com/google/fonts.git google-fonts

    Note: Font paths are configured in agent/config.py and may need to be adjusted for your OS.


πŸš€ How to Run

Create a New Video

python main.py --prompt "Create a video for English speakers to learn 5 essential Japanese words for a ramen shop"

Resume a Failed Project

If a project fails, you can resume it.

# Resume the very last project that failed
python main.py --resume

# Resume a specific project by name
python main.py --resume "japanese-ramen-shop-words-20231027103000"

Regenerate Assets

You can regenerate assets for any existing project without starting over. This is useful for tweaking visuals, audio, or fixing a failed music track.

If you don't provide a project name, it will target the last modified project.

# Regenerate EVERYTHING for the last project
python main.py --regenerate

# Regenerate only the final video for a specific project
python main.py --regenerate-video "project-name-to-fix"

# Regenerate just the background image for the last project
python main.py --regenerate-background

# Regenerate all spoken word audio files
python main.py --regenerate-words

# Regenerate only the music track
python main.py --regenerate-music

Full list of Regeneration Flags

  • -r, --regenerate
  • -rv, --regenerate-video
  • -rb, --regenerate-background
  • -ri, --regenerate-intro
  • -rm, --regenerate-music
  • -rw, --regenerate-words
  • -rw0, --regenerate-word-0 (and other specific word indices)

🎨 Customization

The easiest way to customize the output is by editing agent/config.py:

  • Voices: Set TTS_RANDOM_VOICE = False and change TTS_DEFAULT_VOICE to use a consistent voice.
  • Fonts: Modify the FONT_MAPPINGS dictionary to change fonts for different languages or scenes. You'll need to provide the correct path to the .ttf file on your system.
  • Timings & Style: Adjust values like CHALLENGE_DURATION_S, MUSIC_VOLUME, or BACKGROUND_DARKEN_OPACITY to change the pacing and look of the video.

πŸ“¦ Output

A successful run will produce a clear summary in your terminal and a neatly organized project folder in output/.

Terminal Summary:

✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨
βœ… SUCCESS Project 'japanese-ramen-shop-words-20231027103000' completed successfully!
πŸŽ₯ Final video archived in: output/japanese-ramen-shop-words-20231027103000/
--------------------
βœ… Title (English): 5 Essential Japanese Words for the Ramen Shop!
βœ… Description (English): This video will teach you 5 key Japanese words you need to know when visiting a ramen shop. Perfect for your next trip to Japan!
βœ… Hashtags: #LearnJapanese #JapaneseLesson #RamenShop #JapanTravel #ζ—₯本θͺžε‹‰εΌ· #ラーパン
✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨

Project Directory: The output/ directory contains everything:

output/
β”œβ”€β”€ japanese-ramen-shop-words-20231027103000/
β”‚   β”œβ”€β”€ background.png
β”‚   β”œβ”€β”€ intro_audio.wav
β”‚   β”œβ”€β”€ word_0.wav
β”‚   β”œβ”€β”€ word_1.wav
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ music.wav
β”‚   └── final_video.mp4
└── projects.sqlite  <-- The central database for ALL projects

Releases

No releases published

Packages

No packages published

Languages