A powerful Python backend service that automatically generates vertical video clips from podcast content using AI. Built with Modal for serverless deployment, WhisperX for transcription, and Google Gemini for intelligent moment identification.
- π€ AI-Powered Transcription - WhisperX with large-v2 model for accurate speech-to-text
- π― Smart Clip Detection - Google Gemini identifies optimal Q&A moments
- π₯ Face Tracking - Columbia face tracking for dynamic video framing
- π± Vertical Video Generation - Optimized 9:16 aspect ratio for social media
- π Automatic Subtitles - Styled ASS subtitles with custom fonts
- βοΈ Cloud Storage - AWS S3 integration for video processing
- β‘ Serverless Deployment - Modal for scalable GPU-powered processing
- Modal - Serverless GPU compute platform
- Python 3.12 - Programming language
- CUDA 12.4 - GPU acceleration
- FFmpeg - Video processing
- OpenCV - Computer vision
- WhisperX - Speech transcription and alignment
- Google Gemini 2.5 - Moment identification
- NumPy - Numerical computations
- PyTorch - Deep learning framework
- ffmpegcv - Video I/O with GPU acceleration
- pysubs2 - Subtitle generation
- Columbia Face Tracker - Face detection and tracking
- AWS S3 - Video storage and delivery
- Modal Volumes - Model caching
- Modal Secrets - Environment variable management
Before you begin, ensure you have:
- Modal Account - Sign up at modal.com
- AWS Account - For S3 storage
- Google AI Studio - For Gemini API access
- Python 3.8+ - For Modal CLI
- Modal CLI - Install with
pip install modal
- Git - For cloning repositories
# Clone the repository
git clone <your-repo-url>
cd ai-podcast-clipper-backend
# Install Modal CLI
pip install modal
# Login to Modal
modal token new
Create Modal secrets for your environment variables:
# Create Modal secret with all required variables
modal secret create ai-podcast-clip-sass \
AWS_ACCESS_KEY_ID=your_aws_access_key \
AWS_SECRET_ACCESS_KEY=your_aws_secret_key \
AWS_DEFAULT_REGION=your_aws_region \
GEMINI_API_KEY=your_gemini_api_key \
AUTH_TOKEN=your_custom_auth_token
Create a requirements.txt
file:
boto3==1.34.0
opencv-python==4.8.1.78
ffmpegcv==0.3.0
numpy==1.24.3
fastapi==0.104.1
pydantic==2.5.0
whisperx==3.1.1
google-generativeai==0.3.2
pysubs2==1.6.1
tqdm==4.66.1
Pillow==10.1.0
Ensure your project has this structure:
ai-podcast-clipper-backend/
βββ main.py # Main Modal application
βββ requirements.txt # Python dependencies
βββ asd/ # Columbia face tracker
β βββ Columbia_test.py
β βββ weight/
β β βββ finetuning_TalkSet.model
β βββ ...
βββ README.md
# Deploy the application
modal deploy main.py
# Or run locally for testing
modal run main.py
-
Create S3 Bucket:
aws s3 mb s3://ai-podcast-clipper-new
-
Set Bucket Policy (replace
your-bucket-name
):{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": {"AWS": "arn:aws:iam::YOUR-ACCOUNT:user/YOUR-USER"}, "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], "Resource": "arn:aws:s3:::ai-podcast-clipper-new/*" } ] }
-
Configure CORS (if needed for web access):
[ { "AllowedHeaders": ["*"], "AllowedMethods": ["GET", "PUT", "POST"], "AllowedOrigins": ["*"], "ExposeHeaders": [] } ]
-
Get API Key:
- Visit Google AI Studio
- Create a new API key
- Add to Modal secrets
-
Test Connection:
import google.generativeai as genai genai.configure(api_key="your-api-key") model = genai.GenerativeModel('gemini-pro')
-
Download Model:
- Place
finetuning_TalkSet.model
inasd/weight/
- Ensure all Columbia dependencies are in
asd/
directory
- Place
-
Font Installation: The Modal image automatically installs Anton font for subtitles.
POST https://your-modal-app-url/process_video
{
"Content-Type": "application/json",
"Authorization": "Bearer your_auth_token"
}
{
"s3_key": "path/to/your/video.mp4"
}
curl -X POST "https://your-modal-app-url/process_video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_auth_token" \
-d '{"s3_key": "test1/example.mp4"}'
import requests
url = "https://your-modal-app-url/process_video"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer your_auth_token"
}
payload = {"s3_key": "test1/example.mp4"}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
- Downloads video from S3 using provided key
- Stores temporarily in Modal container
- Extracts audio using FFmpeg
- Transcribes with WhisperX large-v2 model
- Performs word-level alignment
- Sends transcript to Google Gemini
- AI identifies optimal Q&A segments
- Returns clip boundaries (30-60 seconds)
- Runs Columbia face tracker on each clip
- Generates face tracking data
- Scores facial expressions
- Creates 1080x1920 vertical format
- Dynamically crops based on face tracking
- Applies blur background for letterboxing
- Creates ASS subtitle file
- Applies custom styling (Anton font)
- Syncs with word-level timestamps
- Combines video with subtitles
- Uploads final clip to S3
- Cleans up temporary files
- Recommended: L40S GPU (48GB VRAM)
- Minimum: A100 (40GB VRAM)
- Processing Time: ~2-5 minutes per clip
- L40S: ~$1.50/hour
- Storage: Volume costs for model caching
- Network: Data transfer costs
- Model Caching: Models are cached in Modal volumes
- Batch Processing: Process multiple clips per session
- Efficient Cleanup: Temporary files are automatically removed
-
CUDA Out of Memory:
# Reduce batch size in WhisperX result = self.whisperx_model.transcribe(audio, batch_size=8)
-
S3 Access Denied:
# Check AWS credentials and bucket permissions aws s3 ls s3://ai-podcast-clipper-new
-
Gemini API Errors:
# Verify API key and quota export GEMINI_API_KEY=your_key
-
FFmpeg Issues:
# Check video format compatibility ffmpeg -i input.mp4 -f null -
Enable verbose logging:
import logging
logging.basicConfig(level=logging.DEBUG)
# View application logs
modal logs ai-podcast-clipper
# Stream real-time logs
modal logs ai-podcast-clipper --follow
- Modal automatically scales based on demand
- Configure
retries
andtimeout
for reliability - Use
scaledown_window
to optimize costs
- Increase GPU memory for larger videos
- Adjust batch sizes for transcription
- Optimize video resolution for faster processing
- Secure Secrets: Use Modal secrets for all credentials
- Authentication: Implement strong bearer tokens
- Input Validation: Validate S3 keys and file types
- Network Security: Restrict API access if needed
# Required secrets in Modal
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AWS_DEFAULT_REGION=us-east-1
GEMINI_API_KEY=your_gemini_key
AUTH_TOKEN=your_secure_token
# Test locally (requires GPU)
modal run main.py::main
# Deploy to Modal
modal deploy main.py
AiPodcastClipper
: Main Modal classprocess_video
: FastAPI endpointtranscribe_video
: WhisperX integrationidentify_moments
: Gemini AI integrationprocess_clip
: Video processing pipelinecreate_vertical_video
: Video format conversioncreate_subtitles_with_ffmpeg
: Subtitle generation
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature
- Test with Modal:
modal run main.py
- Commit changes:
git commit -m 'Add new feature'
- Push to branch:
git push origin feature/new-feature
- Submit a pull request
- Check Modal documentation: docs.modal.com
- AWS S3 documentation: aws.amazon.com/s3
- Google Gemini API: ai.google.dev
Built with β€οΈ using Modal, Python, and cutting-edge AI technologies.
A modern web application for AI-powered podcast clipping and management. Built with Next.js 15, TypeScript, and a robust tech stack for seamless audio content processing.
- π€ AI-powered podcast clipping
- π€ User authentication and authorization
- π³ Stripe payment integration
- βοΈ AWS S3 file storage
- π Dashboard with analytics
- π Background job processing with Inngest
- π± Responsive design with Tailwind CSS
- π¨ Modern UI components with Radix UI
- Next.js 15 - React framework with App Router
- React 19 - UI library
- TypeScript - Type safety
- Tailwind CSS 4 - Utility-first CSS framework
- Radix UI - Unstyled, accessible UI components
- Framer Motion - Animation library
- React Hook Form - Form handling
- Zod - Schema validation
- Prisma - Database ORM
- NextAuth.js 5 - Authentication
- bcryptjs - Password hashing
- AWS S3 - File storage
- Stripe - Payment processing
- Inngest - Background job processing
- ESLint - Code linting
- Prettier - Code formatting
- TypeScript - Static type checking
Before you begin, ensure you have the following installed:
- Node.js (v18 or higher)
- npm (v10.8.2 or higher)
- Database (PostgreSQL, MySQL, or SQLite)
git clone <your-repo-url>
cd ai-podclip-web
npm install
Copy the example environment file and configure your variables:
cp .env.example .env
Update the .env
file with your configuration:
# Database
DATABASE_URL="your-database-connection-string"
# NextAuth.js
NEXTAUTH_SECRET="your-nextauth-secret"
NEXTAUTH_URL="http://localhost:3000"
# AWS S3
AWS_ACCESS_KEY_ID="your-aws-access-key"
AWS_SECRET_ACCESS_KEY="your-aws-secret-key"
AWS_REGION="your-aws-region"
AWS_S3_BUCKET_NAME="your-s3-bucket-name"
# Stripe
STRIPE_SECRET_KEY="your-stripe-secret-key"
STRIPE_PUBLISHABLE_KEY="your-stripe-publishable-key"
STRIPE_WEBHOOK_SECRET="your-stripe-webhook-secret"
# Inngest
INNGEST_EVENT_KEY="your-inngest-event-key"
INNGEST_SIGNING_KEY="your-inngest-signing-key"
# Generate Prisma client
npm run postinstall
# Run database migrations
npm run db:migrate
# (Optional) Push schema changes for development
npm run db:push
# (Optional) Open Prisma Studio to view your data
npm run db:studio
# Make the script executable
chmod +x start-database.sh
# Run the database startup script
./start-database.sh
Start the development server:
npm run dev
The application will be available at http://localhost:3000
src/
βββ actions/ # Server actions
β βββ auth.ts
β βββ generation.ts
β βββ s3.ts
β βββ stripe.ts
βββ app/ # Next.js App Router
β βββ api/ # API routes
β βββ dashboard/ # Dashboard pages
β βββ demo/ # Demo page
β βββ login/ # Authentication pages
β βββ signup/
βββ components/ # React components
β βββ ui/ # Reusable UI components
β βββ ... # Feature-specific components
βββ inngest/ # Background job functions
βββ lib/ # Utility functions
βββ schemas/ # Zod validation schemas
βββ server/ # Server configuration
βββ styles/ # Global styles
npm run dev # Start development server with Turbo
npm run build # Build for production
npm run start # Start production server
npm run preview # Build and start production server
npm run db:generate # Generate and run migrations
npm run db:migrate # Deploy migrations
npm run db:push # Push schema changes
npm run db:studio # Open Prisma Studio
npm run lint # Run ESLint
npm run lint:fix # Fix ESLint issues
npm run typecheck # Run TypeScript checks
npm run check # Run lint and typecheck
npm run format:check # Check code formatting
npm run format:write # Format code with Prettier
This project uses NextAuth.js v5 for authentication. To set up authentication providers:
- Configure your authentication providers in
src/server/auth/config.ts
- Update the database schema if needed
- Run database migrations:
npm run db:migrate
For payment processing:
- Set up your Stripe account and get API keys
- Configure webhook endpoints in your Stripe dashboard
- Update the webhook handler in
src/app/api/webhooks/stripe/route.ts
For file uploads and storage:
- Create an S3 bucket in your AWS account
- Set up appropriate IAM permissions
- Configure CORS settings for your bucket
- Update the S3 configuration in your environment variables
This project uses Inngest for background job processing:
- Set up your Inngest account
- Configure event keys and signing keys
- Deploy your functions to handle background tasks
npm run build
npm run start
Ensure all production environment variables are properly configured in your deployment platform.
Run migrations in production:
npm run db:migrate
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature
- Commit your changes:
git commit -m 'Add some feature'
- Push to the branch:
git push origin feature/your-feature
- Open a pull request
If you encounter any issues:
- Check the existing issues in the repository
- Run
npm run check
to verify your setup - Ensure all environment variables are properly configured
- Check the console for any error messages
Built with β€οΈ using Next.js, TypeScript, and modern web technologies.