Control your computer using nothing but your hands.
This Python-powered project uses your webcam to detect hand landmarks in real time and map specific gestures to system actions like:
- Navigating presentations
- Moving a virtual pointer
- Adjusting system volume
- Taking instant screenshots
…and more (extensible)!
- Features
- System Architecture
- Gestures & Mappings
- Prerequisites
- Installation
- Running the App
- Usage Guide
- Configuration & Customization
- Performance Tips
- Troubleshooting
- Roadmap
- Contributing
- License
- Acknowledgements
- 🔍 Real-Time Hand Landmark Tracking
Tracks 21 hand keypoints using MediaPipe with millisecond-level responsiveness. - 🖥 Presentation Navigation
Open palm + swipe → next / previous slide. - 🖱 Pointer / Cursor Mode
Index finger isolation controls a virtual cursor (mapped to screen coordinates). - 🔊 Volume Control (Fist Depth)
Closed fist distance to camera scales to volume level. - 📸 Instant Screenshot
Peace ✌️ gesture triggers a saved screenshot. - 🧠 Gesture State Overlay
On-screen HUD: active mode, frame rate, detected gesture. - 🧩 Easily Extensible
Add custom gestures and map them to keystrokes or system actions. - 🛡 Non-Intrusive
Does not require special hardware—just a standard webcam.
Webcam Feed
│
▼
OpenCV Frame Capture
│
▼
MediaPipe Hand Detector
│ (21 landmarks)
▼
Gesture Analyzer (logic & heuristics)
│
├─► Action Mapper (PyAutoGUI / OS APIs)
│
└─► Visual Feedback (HUD overlay)
│
▼
Display Window (Real-time output)
Core pipeline phases:
- Frame acquisition
- Landmark detection
- Gesture classification (finger states, distances, angles)
- Action execution (keyboard/mouse/system)
- Feedback rendering
Gesture | Description | Default Action |
---|---|---|
Open Palm (5 fingers) + Horizontal Swipe | Directional motion threshold | Slide change (← / →) |
Single Index Finger | Finger tip tracked | Cursor / laser pointer |
Closed Fist | Z-depth change toward/away camera | Volume down / volume up |
Peace ✌️ (Index + Middle) | Fingers spaced | Capture screenshot |
(Placeholder) Thumb Up | (Optional) | Could map to Play/Pause |
(Placeholder) OK Sign | (Optional) | Could confirm dialogs |
You can expand gestures by adding rules for finger combinations, angles between landmarks, or temporal sequences.
- Python 3.9+ (3.10+ recommended)
- Webcam
- A desktop environment (not headless)
- Permissions to control system audio (for volume features)
Clone the repository:
git clone https://github.com/Amanbundela75/Gesture-Control-System.git
cd Gesture-Control-System
(Adjust the URL if your repository name differs.)
Create and activate a virtual environment:
# Windows
python -m venv venv
venv\Scripts\activate
# macOS / Linux
python3 -m venv venv
source venv/bin/activate
Install dependencies:
pip install -r requirements.txt
If you don't have a requirements file yet, generate one after manually installing packages:
pip install opencv-python mediapipe pyautogui numpy
pip freeze > requirements.txt
python gesture_control.py
Optional arguments (if implemented):
--camera 0 # Select a different camera index
--no-volume # Disable volume control logic
--debug # Show raw landmark coordinates
--smoothing 4 # Adjust cursor smoothing factor
- Launch the script; a window displaying the camera feed will open.
- Ensure your hand is within a well-lit area (avoid overexposure).
- Perform gestures:
- Slide navigation: Open palm + swipe left/right
- Cursor mode: Show only index finger
- Volume control: Make a fist and move it closer/farther
- Screenshot: Show peace sign ✌️ (saved as screenshot_YYYYMMDD_HHMMSS.png)
- Press Esc (or Ctrl + C in terminal) to exit.
You can customize behavior by editing (examples):
# Example gesture thresholds (hypothetical snippet)
SWIPE_MIN_DISTANCE = 120 # Pixels across frame
CURSOR_SMOOTHING = 0.25 # Lower = more precise, higher = smoother
FIST_DEPTH_MIN = 0.05 # Normalized z-range
FIST_DEPTH_MAX = 0.25
SCREENSHOT_DIR = "screenshots"
Add a new gesture mapping (conceptual outline):
if is_thumb_up(landmarks):
perform_media_play_pause()
Ideas for expansion:
- Multi-hand coordination
- Gesture sequences (e.g., fist → open → index = mode toggle)
- UI overlay toggles
- Config file (JSON / YAML) for dynamic gesture/action mapping
- Reduce frame size (e.g., 640x480) for smoother FPS.
- Use a dedicated lighting source to minimize detection jitter.
- Apply exponential smoothing to cursor coordinates:
smoothed = alpha * new + (1 - alpha) * previous
- Avoid running in Remote Desktop sessions—some OS APIs block input control.
- On laptops, disable auto-exposure if flickering occurs (OpenCV camera properties).
Test Case | Expected Result |
---|---|
Hand partially out of frame | No false gesture triggers |
Two gestures quickly | Only one action executed |
Low light conditions | Landmarks degrade gracefully |
Rapid swipes | Slide changes max once per swipe |
Fist near camera | Max volume threshold clamped |
You can automate parts of logic with synthetic landmark arrays in unit tests.
Feature | Windows | macOS | Linux |
---|---|---|---|
Cursor Control | ✅ | ✅ | ✅ |
Volume Control | ✅ (PyAutoGUI / ctypes) | ⚠ (May require AppleScript) | ⚠ (Depends on pactl / amixer) |
Screenshots | ✅ | ✅ | ✅ |
You may implement OS-specific volume controllers:
- macOS:
osascript -e "set volume output volume X"
- Linux:
pactl set-sink-volume @DEFAULT_SINK@ 50%
Issue | Cause | Fix |
---|---|---|
Window opens but no gestures | Wrong camera index | Try --camera 1 |
High latency | Large frame size | Downscale frames before processing |
Cursor jitter | Minor landmark noise | Increase smoothing factor |
Volume not changing | OS not supported | Disable volume feature or add OS hook |
Screenshot not saved | No write permission | Run in writable directory |
Enable debug overlays or log prints for diagnosing coordinate drift.
- Configurable gesture-to-action JSON
- Multi-hand simultaneous actions
- GUI settings panel
- Temporal gesture recognition (LSTM / transformer)
- Dark mode optimized segmentation
- Plugin architecture for add-on actions
- In-app calibration wizard
- Export analytics (gesture usage frequency)
Feel free to propose additions via Issues / Discussions.
Contributions are welcome!
- Fork the repository
- Create a feature branch:
git checkout -b feat/your-feature
- Commit changes:
git commit -m "Add feature: description"
- Push branch:
git push origin feat/your-feature
- Open a Pull Request (clearly describe motivation & approach)
Suggested quality checks:
- Run
flake8
orruff
(if configured) - Test on at least one alternate lighting condition
- Document new gestures in this README
This project is licensed under the MIT License.
See the LICENSE file for details.
- MediaPipe Hands for robust hand landmark detection.
- OpenCV community for foundational computer vision tooling.
- Inspiration from various HCI (Human-Computer Interaction) prototypes.
Have an idea or found a bug? Open an Issue or start a Discussion.
If this project helps you, consider starring ⭐ the repository!
Happy gesturing! ✨