A computer vision project that uses YOLO (You Only Look Once) models to detect and recognize mahjong tiles from real-world photographs. The project includes multiple model variants (nano, small, medium, large, extra-large) optimized for different use cases.
Dataset: https://www.kaggle.com/datasets/shinz114514/mahjong-hand-photos-taken-with-mobile-camera/data
This project implements mahjong tile recognition using YOLOv11, capable of:
- Detecting mahjong tiles in real-world photographs
- Recognizing different tile types and suits
- Processing images with various lighting conditions and backgrounds
- Providing both PyTorch (.pt) and ONNX model formats for deployment
βββ models/ # Trained models organized by size
β βββ nano/ # YOLOv11n models (fastest, lowest accuracy)
β βββ small/ # YOLOv11s models (balanced speed/accuracy)
β βββ medium/ # YOLOv11m models (good accuracy)
β βββ large/ # YOLOv11l models (high accuracy)
β βββ extra_large/ # YOLOv11x models (highest accuracy)
β βββ *.onnx # ONNX format models for deployment
βββ scripts/ # Utility scripts
β βββ convert_yolo_to_onnx.py # Convert PyTorch models to ONNX
βββ notebooks/ # Jupyter notebooks for training and analysis
β βββ data_labeling/ # Data annotation and labeling notebooks
β βββ data_processing/ # Data preprocessing notebooks
β βββ yolo.ipynb # YOLO training notebook
β βββ yolo_predict.ipynb # Prediction and evaluation notebook
βββ results/ # Training and evaluation results
β βββ training/ # Training logs, metrics, and model checkpoints
β βββ validation/ # Validation results
β βββ predictions/ # Prediction outputs and visualizations
βββ data/ # Dataset organization
β βββ raw/ # Original images
β βββ processed/ # Preprocessed images
β βββ annotations/ # Label files
βββ docs/ # Documentation
βββ examples/ # Usage examples
βββ README.md # This file
Model Size | Base Model | Trained Model | ONNX Model | Training Details | Speed | Accuracy | Use Case |
---|---|---|---|---|---|---|---|
Nano | yolo11n.pt | mahjong-yolon-best.pt | mahjong-yolon-best.onnx | yolon6 variant | β‘β‘β‘β‘β‘ | βββ | Mobile/Edge devices |
Small | yolo11s.pt | mahjong-yolos-best.pt | mahjong-yolos-best.onnx | yolos2 variant | β‘β‘β‘β‘ | ββββ | Real-time applications |
Medium | yolo11m.pt | mahjong-yolom-best.pt | mahjong-yolom-best.onnx | 94 epochs | β‘β‘β‘ | βββββ | Balanced performance |
Large | yolo11l.pt | mahjong-yolol-best.pt | - | 51 epochs | β‘β‘ | ββββββ | High accuracy needs |
Extra Large | yolo11x.pt | - | - | Not trained yet | β‘ | βββββββ | Maximum accuracy |
- Nano (YOLOv11n): Fastest inference, optimized for mobile deployment
- Small (YOLOv11s): Good balance of speed and accuracy for real-time applications
- Medium (YOLOv11m): Recommended for most use cases, best accuracy/speed trade-off
- Large (YOLOv11l): High accuracy for production applications
- Extra Large (YOLOv11x): Maximum accuracy when speed is not critical
pip install ultralytics opencv-python matplotlib torch torchvision
pip install jupyter notebook albumentations numpy
from ultralytics import YOLO
# Load a trained model
model = YOLO('models/medium/mahjong-yolom-best.pt')
# Run inference on an image
results = model.predict('path/to/mahjong/image.jpg')
# Display results
results[0].show()
import onnxruntime as ort
import cv2
import numpy as np
# Load ONNX model
session = ort.InferenceSession('models/mahjong-yolom-best.onnx')
# Preprocess image
img = cv2.imread('path/to/image.jpg')
img_resized = cv2.resize(img, (640, 640))
img_normalized = img_resized.astype(np.float32) / 255.0
img_transposed = np.transpose(img_normalized, (2, 0, 1))
img_batch = np.expand_dims(img_transposed, axis=0)
# Run inference
outputs = session.run(None, {'images': img_batch})
Convert PyTorch models to ONNX format:
python scripts/convert_yolo_to_onnx.py models/medium/mahjong-yolom-best.pt
Batch conversion:
python scripts/convert_yolo_to_onnx.py models/ --batch
-
Organize your dataset in YOLO format:
dataset/ βββ images/ β βββ train/ β βββ val/ β βββ test/ βββ labels/ βββ train/ βββ val/ βββ test/
-
Create a data configuration file (
data.yaml
):train: path/to/train/images val: path/to/val/images test: path/to/test/images nc: 34 # number of classes (mahjong tile types) names: ['1m', '2m', '3m', ..., 'red', 'green', 'white']
from ultralytics import YOLO
# Train nano model
model = YOLO('models/nano/yolo11n.pt')
model.train(data='data.yaml', epochs=500, batch=24, name='mahjong-yolon')
# Train small model
model = YOLO('models/small/yolo11s.pt')
model.train(data='data.yaml', epochs=500, batch=16, name='mahjong-yolos')
# Train medium model
model = YOLO('models/medium/yolo11m.pt')
model.train(data='data.yaml', epochs=500, batch=12, name='mahjong-yolom')
# Train large model
model = YOLO('models/large/yolo11l.pt')
model.train(data='data.yaml', epochs=500, batch=10, name='mahjong-yolol')
# Validate trained model
model = YOLO('models/medium/mahjong-yolom-best.pt')
metrics = model.val()
print(f"mAP50: {metrics.box.map50}")
print(f"mAP50-95: {metrics.box.map}")
Training results include:
- Precision/Recall curves
- F1 score curves
- Confusion matrices
- Training loss graphs
- Validation metrics
The model recognizes the following mahjong tile types:
- 1m through 9m
- 1p through 9p
- 1s through 9s
- East, South, West, North
- Red Dragon, Green Dragon, White Dragon
- Flower tiles (if applicable)
- Season tiles (if applicable)
- Update the data configuration file with new classes
- Retrain the model with expanded dataset
- Update the class names in prediction scripts
Key training parameters to adjust:
batch
: Batch size (adjust based on GPU memory)lr0
: Initial learning rateepochs
: Training epochspatience
: Early stopping patienceconf
: Confidence threshold for predictionsiou
: IoU threshold for NMS
- Use nano or small models
- Convert to ONNX format
- Use TensorRT for NVIDIA GPUs
- Optimize input image size
- Use medium, large, or extra-large models
- Increase training epochs
- Use data augmentation
- Ensemble multiple models
- Use ONNX models for cross-platform compatibility
- Implement batch processing for multiple images
- Use GPU acceleration when available
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests and documentation
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Zhen Zhang - zhenz@vt.edu
- Yiyun Huang - yiyunh@vt.edu
- Ultralytics for the YOLO implementation
- The computer vision community for datasets and techniques
- Contributors to the mahjong recognition research
For questions and support:
- Open an issue on GitHub
- Check the documentation in the
docs/
folder - Review the example notebooks in
notebooks/
Built with β€οΈ for the mahjong and computer vision communities