Skip to content

Tools for hand gesture recognition

gitkatrin edited this page Sep 15, 2020 · 19 revisions

1. Hidden Markov Model (HMM) (IEE article)

  • Human action recognition method
  • characterized by its learning capability and time-scale invariability
  • recognition rates of sports scenes are higher than 90%
  • recognition rate was improved by mixing the data from two subjects for learning
  • deals with 2D images but can be extended to 3D object using aspect graphs in assigning symbols
  • applicability to multi-modal time-sequential pattern recognition (sensor fusion problems)
  • number of states and structure of HMM must be predefined (different to FSM below); well-aligned data segments are required

1.1 Process/Algorithm

  1. one set of time-sequential images is transformed into an image feature vector sequence
  2. the sequence is converted into a symbol sequence by vector quantization
  3. the parameters (one per category) will be assign in learning human action categories
  4. After recognizing an observed sequence, the best matches for this sequence will be choosen (calculates the combined probability of the membership, compares the probability with a threshold, and decides whether the data is accepted or rejected)

1.2 Usage

  • finding shoplifters in department stores
  • dangerous behavior in kindergarten
  • recognize words by analyzing sound and height of lip data

J. Yamato, J. Ohya and K. Ishii, "Recognizing human action in time-sequential images using hidden Markov model," Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, IL, USA, 1992, pp. 379-385, doi: 10.1109/CVPR.1992.223161.

2. Condensation Algorithm (SpringerLink article)

  • runs in near real-time (notwithstanding the use of stochastic methods)
  • result: highly rubust tracking of agile motion

Process/Algorithm

  1. use "factored sampling" in which the probability distribution of possible interpretations is presented by a randomly generated set
  2. use dynamical models with visual observations (stochastic methods)

Usage

  • vision-based robot localization of mobile robots
  • recognizing human gestures in image sequences
  • face recognition in video sequences

Isard, M., Blake, A. CONDENSATION—Conditional Density Propagation for Visual Tracking. International Journal of Computer Vision 29, 5–28 (1998). https://doi.org/10.1023/A:1008078328650

3. Finite State Machines (FSMs) (IEE article)

  • state based to gesture learning and recognition
  • 2D images
  • real-time on-line performance through the computational efficient of the recognizer
  • segments ans aligns the training data and produces the gesture model at once (different to HMM above)
  • reduces the computation complexity (in comparison to HMM) by associating each state with a threshold that learned from data, recognition is done based on the data at current point in time and the context indormation that is stored in the FSM
  • publisher are currently working on a unsupervised model construction method to make training simpler and better

3.1 Prozess/Algorithm

  • each state sequence is a FSM recognizer for a gesture
  1. stage sequence:
    • spatial and temporal information of data are decoupled
    • algorithm learns the distribution of data via dynamic k-means clustering (only spatial information)
    • result: provides support of data segmentation and alignment
  2. stage sequence:
    • temporary information is learned from aligned data segments
    • spatial data is updated
  3. stage sequence:
    • represents the gesture

3.2 Usage

  • recognizing hands (an experimental system that plays a game of "Simon Says" with the user)
  • recognizing face (an experimental system that plays a game of "Simon Says" with the user)
  • recognizing language bei grammar

Pengyu Hong, M. Turk and T. S. Huang, "Gesture modeling and recognition using finite state machines," Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), Grenoble, France, 2000, pp. 410-415, doi: 10.1109/AFGR.2000.840667.

4. Motion Trajectories (SpringerLink article)

  • for extracting and classifying two-dimensional motions
  • using a time-delay neuronal network

4.1 Process/Algorithm

  1. performing a multiscale segmentation to generate homogeneous regions in each frame
  2. regions between consecutive frames are matched to obtain two-view correspondences
  3. defining pixel matches by computing affine transformations from each pair of corresponding regions
  4. concatenating (verketten) pixel matches over consecutive image pairs to obtain pixel-level motion trajectories across image sequence
  5. Motion pattern are learned from extracted trajectories using a time-delay neuronal network

4.2 Usage

  • recognizing hand gestures

Yang MH., Ahuja N. (2001) Recognizing Hand Gestures Using Motion Trajectories. In: Face Detection and Gesture Recognition for Human-Computer Interaction. The International Series in Video Computing, vol 1. Springer, Boston, MA