A comprehensive system for converting dance videos into pose time series data using MediaPipe's AI pose estimation, generating vector embeddings for poses and movement segments, and enabling motion analysis in high-dimensional space.
- Pose Extraction: Extract 2D and 3D pose landmarks from dance videos using MediaPipe
- Embedding Generation: Create vector embeddings for individual poses and 5-second movement segments
- Motion Analysis: Analyze motion patterns using dimensionality reduction and clustering
- Visualization: Interactive 3D visualizations using Plotly and Rerun
- Live Prediction: Framework for predicting future movements during live tracking
- CSV Export: Export pose data with timestamps for synchronized playback
- π Dance Recall System: Real-time pose matching and video recall using live camera or video input
- π‘ OSC Streaming: Real-time pose data streaming via Open Sound Control protocol
Want to test the system quickly? Start here!
# Clone and setup
git clone git@github.com:kinetecharts/dance-embedding.git
cd dance_embedding
# Create virtual environment with Python 3.9
uv venv --python 3.9
source .venv/bin/activate
# Install dependencies
uv pip install -e .
# Create data directories
mkdir -p data/video data/poses
# Add some dance videos to data/video/
# Then extract poses from all videos
python -m pose_extraction.main --input-dir data/video
# Build the LanceDB vector database for fast pose matching
python rebuild_database.py
# Start real-time pose matching with camera
python -m recall.main --mode camera --top-n 1 --match-interval 2.0 --playback-duration 3.0
# Analyze a specific video file
python -m recall.main --mode video --input data/video/your_video.mp4 --top-n 1 --match-interval 2.0
- Left Window: Live camera/video feed with red pose skeleton
- Right Window: Matched reference video frame with green pose dots
- Overlay Info: Match details (video name, timestamp, similarity score)
- Controls: Press 'q' to quit, 'p' to pause, 'r' to reset
- Use
--top-n 1
for fastest matching - Use
--match-interval 2.0
or higher for better performance - Ensure good lighting for camera mode
- Works best with 3-10 reference videos in database
The system consists of four main components:
- Pose Extraction (
src/pose_extraction/
): Uses MediaPipe and Rerun to extract pose landmarks from videos - Dimension Reduction (
src/dimension_reduction/
): Creates visualizations and interactive analysis - Embedding Generation (planned): Will create vector embeddings using Transformer or LSTM models
- π Dance Recall System (
src/recall/
): Real-time pose matching and video recall with live camera support
- Python 3.9 (required; other versions are not supported due to MediaPipe and UMAP dependencies)
- uv (recommended) or pip
-
Clone the repository:
git@github.com:kinetecharts/dance-embedding.git cd motion_embedding
-
Install Python 3.9 (if not already installed):
- On macOS:
brew install python@3.9
- Or use pyenv:
pyenv install 3.9.18 pyenv local 3.9.18
- On macOS:
-
Create and activate a virtual environment with Python 3.9:
uv venv --python 3.9 source .venv/bin/activate
-
Install using uv (recommended):
# Install uv if not already installed curl -LsSf https://astral.sh/uv/install.sh | sh # Install dependencies uv pip install -e .
-
Or install using pip:
pip install -e .
-
Run the installation script (optional):
python install.py
The Dance Recall System enables real-time pose matching and video recall, allowing you to find similar dance movements from a database of pre-recorded videos while performing live or analyzing video files.
- Real-time Pose Matching: Match live camera poses against a database of dance movements
- Video Input Support: Analyze pre-recorded videos for pose matching
- Side-by-Side Display: View live pose and matched reference pose simultaneously
- Multiple Video Support: Match against multiple dance videos in the database
- Configurable Matching: Adjust matching frequency, top-N results, and playback duration
- Performance Metrics: Real-time FPS and match statistics
-
Ensure you have pose data ready:
# Extract poses from your dance videos first python -m pose_extraction.main --input-dir data/video
-
Build LanceDB database for fast matching:
# Create vector database for efficient pose matching python rebuild_database.py
-
Run with live camera:
# Start real-time pose matching with camera python -m recall.main --mode camera --top-n 3 --match-interval 1.0 --playback-duration 3.0
-
Run with video file:
# Analyze a specific video file python -m recall.main --mode video --input data/video/dance_video.mp4 --top-n 3 --match-interval 2.0 --playback-duration 3.0
When running the Dance Recall System, you'll see a window titled "Dance Recall System" with:
- Left Side: Live camera feed or input video with red pose skeleton overlay
- Right Side: Matched reference video frame with green pose dots overlay
- Overlay Information: Match details including video name, timestamp, and similarity score
python -m recall.main [OPTIONS]
Options:
--mode {camera,video} Input mode: camera or video file
--input PATH Input video file (required for video mode)
--top-n INTEGER Number of top matches to consider (default: 3)
--match-interval FLOAT Interval between matches in seconds (default: 2.0)
--playback-duration FLOAT Duration to display each match (default: 3.0)
--pose-dir PATH Directory containing pose CSV files (default: data/poses)
--video-dir PATH Directory containing video files (default: data/video)
Live Camera Mode:
# Basic camera mode with default settings
python -m recall.main --mode camera
# Camera mode with custom settings
python -m recall.main --mode camera --top-n 5 --match-interval 0.5 --playback-duration 5.0
Video File Mode:
# Analyze a specific video file
python -m recall.main --mode video --input data/video/Dai2.mov
# Analyze with custom settings
python -m recall.main --mode video --input data/video/dance.mp4 --top-n 3 --match-interval 1.0
Custom Data Directories:
# Use custom pose and video directories
python -m recall.main --mode camera --pose-dir /path/to/poses --video-dir /path/to/videos
While the system is running:
- Press 'q': Quit the application
- Press 'p': Pause/resume matching
- Press 'r': Reset match display
- Press '1-9': Select top-N matches (1-9)
- Optimize for Real-time: Use
--match-interval 1.0
or higher for better performance - Reduce Top-N: Use
--top-n 3
instead of higher values for faster matching - Camera Quality: Ensure good lighting and clear camera view for better pose detection
- Database Size: The system works best with 3-10 reference videos in the database
No matches found:
- Ensure pose CSV files exist in
data/poses/
- Check that video files are in
data/video/
- Verify pose extraction was completed successfully
Poor performance:
- Reduce
--top-n
value - Increase
--match-interval
- Close other applications to free up CPU/GPU resources
Camera not working:
- Ensure camera permissions are granted
- Try a different camera if available
- Check camera is not being used by another application
The Dance Recall System uses LanceDB for efficient vector similarity search. The database stores pose embeddings for fast matching.
Initial Setup:
# Extract poses from videos first
python -m pose_extraction.main --input-dir data/video
# Build LanceDB database
python rebuild_database.py
Rebuilding the Database:
# Rebuild database (clears existing data)
python rebuild_database.py
Custom Database Path:
from recall.pose_embedding import create_pose_database
# Create database with custom path
database = create_pose_database(
pose_dir="data/poses",
video_dir="data/video",
db_path="data/custom_database.lancedb"
)
The LanceDB database contains:
- 32-dimensional pose embeddings for efficient similarity search
- Video metadata (filename, timestamp, frame number)
- Pose landmarks and confidence scores
- Indexed vectors for fast L2 and cosine similarity search
- Location:
data/pose_database.lancedb/
(excluded from git) - Size: ~10-50MB per 1000 poses (depends on video count)
- Format: LanceDB vector database with embedded metadata
- Search Speed: ~1-5ms per query (vs 100-500ms for CSV search)
- Memory Usage: ~100-500MB for typical dance video collections
- Scalability: Supports 10,000+ poses efficiently
Database not found:
# Rebuild database
python rebuild_database.py
Poor search performance:
# Check database stats
python -c "from recall.pose_embedding import LanceDBPoseDatabase; db = LanceDBPoseDatabase(); print(db.get_database_stats())"
Database corruption:
# Remove and rebuild
rm -rf data/pose_database.lancedb
python rebuild_database.py
Get up and running in minutes with these simple steps:
-
Create the data directory structure:
mkdir -p data/video data/poses data/analysis/dimension_reduction
-
Add a dance video file:
# Copy your dance video to the data/video folder cp /path/to/your/dance_video.mp4 data/video/
-
Extract pose data from the video:
# Process all videos in data/video (default) python -m pose_extraction.main # or specify video python -m pose_extraction.main --video data/video/dance_video.mp4
This will create a CSV file with pose landmarks in
data/poses/
and an overlay video indata/video_with_pose/
for review. -
Run dimension reduction and create visualizations:
# Generate CSV data only (fastest) python -m dimension_reduction.main --video data/video/dance_video.mp4 --pose-csv data/poses/dance_video.csv # Or create interactive HTML visualization python -m dimension_reduction.main --video data/video/dance_video.mp4 --pose-csv data/poses/dance_video.csv --save-html
This generates CSV files in
data/dimension_reduction/
for analysis. -
Start the web application server:
cd src/viewer/webapp python server.py
Open your browser to http://127.0.0.1:50680/ to view interactive visualizations with synchronized video playback.
For automatic processing of new videos as they are added:
# Start the monitor script to watch for new videos
python monitor_videos.py
This script will:
- Watch the
data/video/
directory for new video files - Automatically run pose extraction when a new video is detected
- Run dimension reduction for all methods (PCA, t-SNE, UMAP) on the extracted pose data
- Process videos in the background while you continue working
Note: The first time you run pose extraction (either manually or via monitor), it may take several minutes as MediaPipe downloads its AI models (~100MB). Subsequent runs will be much faster.
The Dance Recall System requires:
- Pose CSV Files: Extracted pose data in
data/poses/
directory - Video Files: Original video files in
data/video/
directory - File Naming: Pose CSV files should match video file names (e.g.,
Dai2.csv
forDai2.mov
)
Custom Pose Matching:
from recall.pose_matcher import PoseMatcher
from recall.config import RecallConfig
# Initialize matcher
config = RecallConfig()
matcher = PoseMatcher(config)
# Find matches for a pose
matches = matcher.find_matches(pose_data, top_n=3)
Video Player Integration:
from recall.video_player import VideoPlayer
from recall.config import RecallConfig
# Initialize video player
config = RecallConfig()
player = VideoPlayer(config)
# Display matched pose
player.display_live_frame(frame, pose_data, match_info)
For development, install with additional dependencies:
uv pip install -e ".[dev]"
Extract poses from a single video:
python -m pose_extraction.main --video data/video/dance.mp4
Extract poses from all videos in a directory:
python -m pose_extraction.main --input-dir data/video
Use Rerun visualization:
python -m pose_extraction.main --video data/video/dance.mp4 --use-rerun
from pose_extraction import PoseExtractionPipeline
# Initialize pipeline
pipeline = PoseExtractionPipeline(use_rerun=False) # Set to True for visualization
# Run pose extraction pipeline
results = pipeline.run_full_pipeline("data/video/dance.mp4")
print(f"Pose data: {results['pose_csv_path']}")
from pose_extraction import PoseExtractor
# Extract poses
extractor = PoseExtractor(use_rerun=True)
pose_data = extractor.extract_pose_from_video("data/video/dance.mp4")
motion_embedding/
βββ src/pose_extraction/
β βββ __init__.py # Main package
β βββ pose_extraction.py # Pose extraction using MediaPipe
β βββ main.py # Pose extraction pipeline
βββ src/dimension_reduction/
β βββ main.py # Dimension reduction and visualization
β βββ visualizer.py # Visualization tools
β βββ reduction_methods.py # Dimension reduction algorithms
βββ src/recall/
β βββ __init__.py # Dance recall system package
β βββ main.py # Main recall system entry point
β βββ recall_system.py # Core recall system logic
β βββ pose_matcher.py # Pose matching algorithms
β βββ pose_normalizer.py # Pose normalization utilities
β βββ video_player.py # Video playback and display
β βββ data_structures.py # Data classes and structures
β βββ config.py # Configuration management
βββ src/viewer/
β βββ webapp/ # Web application for viewing results
βββ data/
β βββ video/ # Input video files
β βββ poses/ # Extracted pose CSV files
β βββ video_with_pose/ # Videos with pose overlays for review
β βββ dimension_reduction/ # Dimension reduction results
βββ examples/
β βββ basic_usage.py # Usage examples
βββ tests/
β βββ test_imports.py # Basic tests
βββ documents/ # Documentation
βββ pyproject.toml # Project configuration
βββ install.py # Installation script
βββ README.md # This file
The system exports pose data in CSV format with the following columns:
timestamp
: Frame timestamp in secondsframe_number
: Frame index{keypoint}_x
,{keypoint}_y
: 2D coordinates for each keypoint{keypoint}_z
: 3D coordinates (if available){keypoint}_confidence
: Confidence scores
Example:
timestamp,frame_number,nose_x,nose_y,nose_z,nose_confidence,...
0.0,0,320.5,240.2,0.1,0.95,...
0.033,1,321.1,239.8,0.12,0.94,...
The system provides several visualization options:
- Rerun Visualization: Real-time 3D pose tracking during extraction
- Plotly Interactive: 3D embeddings, similarity matrices, and motion timelines
- Clustering Analysis: Color-coded clusters in embedding space
python -m pose_extraction.main --video data/video/dance.mp4 --use-rerun
- Rerun Visualization: Real-time 3D pose tracking during extraction
- Output Format: CSV with timestamps and confidence scores
- Keypoints: 33 MediaPipe pose landmarks
- UMAP: Uniform Manifold Approximation and Projection (default)
- t-SNE: t-Distributed Stochastic Neighbor Embedding
- PCA: Principal Component Analysis
The system implements a single-stream OSC system with body-relative coordinates for consistent scale and Z-filters for movement analysis:
- Body-Relative Scale: Use torso length as stable reference for consistent measurements
- Chest-Center Origin: All hand positions relative to chest center point
- Distance Independent: Same gesture produces same values at different distances from camera
- Person Independent: Works with different body sizes
{
"osc_streaming": {
"enabled": true,
"stream_rate": 30.0,
"streams": {
"pose_data": {
"enabled": true,
"host": "127.0.0.1",
"port": 6448,
"address": "/pose/data",
"z_filter": {
"velocity_fast_rise": 0.8,
"velocity_slow_decay": 0.95,
"acceleration_fast_rise": 0.9,
"acceleration_slow_decay": 0.98
}
}
}
}
}
Address: /pose/data
Data Array (21 values):
/pose/data [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
Value Breakdown:
- Values 1-3: Left hand X, Y, Z (body-relative, normalized by torso length)
- Values 4-6: Right hand X, Y, Z (body-relative, normalized by torso length)
- Values 7-9: Left foot X, Y, Z (body-relative, normalized by torso length)
- Values 10-12: Right foot X, Y, Z (body-relative, normalized by torso length)
- Values 13-14: Torso rotation Yaw, Pitch (degrees)
- Values 15-16: Head rotation Yaw, Pitch (relative to torso, degrees)
- Values 17-19: Torso position X, Y, Z (frame coordinates, 0.0-1.0)
- Value 20: Velocity magnitude (Z-filtered, fast rise, slow decay)
- Value 21: Acceleration magnitude (Z-filtered, fast rise, slow decay)
Hand and Foot Positions (Values 1-12)
- Scale: Normalized by torso length (1.0 = one torso length)
- Origin: Chest center point
- Content: Hand and foot center positions only (no finger/toe details)
- Units: Body-relative coordinates
- Values 1-3: Left hand X, Y, Z
- Values 4-6: Right hand X, Y, Z
- Values 7-9: Left foot X, Y, Z
- Values 10-12: Right foot X, Y, Z
Rotation Data (Values 13-16)
- Torso Rotation:
- Yaw: 0Β° when facing camera, positive when turning right, negative when turning left
- Pitch: 0Β° when level, positive when leaning forward, negative when leaning back
- Head Rotation: Relative to torso orientation
- Yaw: 0Β° when aligned with body, positive when turning right relative to body, negative when turning left relative to body
- Pitch: 0Β° when level with body, positive when nodding up, negative when nodding down
- Units: Degrees (-180Β° to +180Β°)
Torso Position (Values 17-19)
- Frame Coordinates: 0.0 to 1.0 relative to camera frame
- Purpose: Absolute positioning in the scene
Movement Analysis (Values 20-21)
- Velocity: Overall movement magnitude with Z-filter (hands + feet)
- Acceleration: Movement change rate with Z-filter (hands + feet)
- Z-Filter: Fast rise (0.8-0.9), slow decay (0.95-0.98)
Primary System: Body-Relative Coordinates
- Origin: Chest center (midpoint between shoulders and hips)
- Scale: Normalized by torso length for consistent measurements
- Units: Relative to body size (1.0 = one torso length)
- Benefits: Same gesture produces same values regardless of distance from camera
Legacy Support: Frame-Relative Coordinates
- X-axis (horizontal): 0.0 to 1.0 (left to right)
- Y-axis (vertical): 0.0 to 1.0 (top to bottom)
- Z-axis (depth): 0.0 to 1.0 (closer to camera = smaller values)
Example OSC Message:
Single Stream Format:
/pose/data [0.5, -0.3, 0.2, 0.8, -0.1, 0.4, -0.2, 0.6, 0.1, -0.1, 0.7, 0.0, 15.2, -5.8, -10.5, 8.2, 0.5, 0.4, 0.6, 0.15, 0.25]
Value Breakdown:
- Values 1-3: Left hand [0.5, -0.3, 0.2] = right, down, forward from chest
- Values 4-6: Right hand [0.8, -0.1, 0.4] = right, down, forward from chest
- Values 7-9: Left foot [-0.2, 0.6, 0.1] = left, up, forward from chest
- Values 10-12: Right foot [-0.1, 0.7, 0.0] = left, up, at chest level
- Values 13-14: Torso rotation [15.2, -5.8] = turning right, leaning forward
- Values 15-16: Head rotation [-10.5, 8.2] = turning left, nodding up (relative to torso)
- Values 11-13: Torso position [0.5, 0.4, 0.6] = center, upper, forward in frame
- Value 14: Velocity magnitude 0.15 (Z-filtered movement)
- Value 15: Acceleration magnitude 0.25 (Z-filtered acceleration)
Coordinate System:
- Hands (1-6): Body-relative, normalized by torso length
- Rotations (7-10): Degrees (-180Β° to +180Β°)
- Torso Position (11-13): Frame coordinates (0.0-1.0)
- Movement (14-15): Z-filtered magnitude values
Why Body-Relative Coordinates?
- Distance Independent: Same gesture gives same values at any distance
- Person Independent: Works with different body sizes
- Gesture Recognition: Consistent values for machine learning applications
- Performance Tracking: Stable measurements for movement analysis
- Python 3.9 (required)
- CPU: Intel i5 or equivalent (minimum)
- RAM: 8GB (minimum), 16GB (recommended)
- Storage: 1GB per minute of video (approximate)
- Rerun: Disable Rerun visualization for faster processing
- Batch Processing: Process multiple videos in parallel
- Memory Management: Use smaller video files for large datasets
Run the test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=pose_extraction
# Run specific test
python tests/test_imports.py
- Requirements: System requirements and goals
- Architecture: System design and components
- Implementation Plan: Development roadmap
- Technical Considerations: Technical details
- Pose Extraction: Pose extraction specifications
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature
- Make your changes
- Run tests:
pytest
- Commit your changes:
git commit -am 'Add new feature'
- Push to the branch:
git push origin feature/new-feature
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- MediaPipe for pose estimation
- Rerun for visualization
- PyTorch for deep learning
- Plotly for interactive visualizations
For questions and support:
- Create an issue on GitHub
- Check the documentation in
documents/
- Review the examples in
examples/
Note: This is an alpha version. The API may change in future releases.