Skip to content

AntoGer/Multimodal-Virtual-Coach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal-interaction

Multimodal-interaction exam project

This project aimed to develop a virtual personal trainer that thanks to two cameras can track different kinds of exercise and support the user through guided workouts and audio and video hints and corrections.

In this project, we focused on multimodality, and so there are different technologies we imported from Google as MediaPipe for pose estimation and Google Speech Recognition.

In particular, we use 2 different cameras to capture video from different angles to better estimate the position of the user in each frame in real-time, to achieve a high precision in angles calculus. To estimate the landmarks and obtain information about the angles we use two different MediaPipe pose estimation models, one for each camera, the frontal one is responsible for checking that there is the right distance between legs and that the backbone is aligned. the side camera ensures that the body is exposed in the front or on the back and calculates the angle of the hip and knee.

Alt text

Image from: www.breakfreephysiotherapy.ca/blog/low-back-pain-from-squats

Executing the main.py script, you will launch the app and display the menu. Here you can select one particular exercise or start a workout:

Alt text

Once you select one exercise you will be asked, thanks to the voice synthesiser, to give some details about the exercise, such as how many repetitions, sets and the recovery time all through speech interaction. This feature has to be improved and it depends a lot on your bandwidth (Google Speech Recognition just records the audio and sends it to a server to analyze it). Then the exercise will start and you should see landmarks appear on your body.

Squat: when the exercise starts the user can see landmarks and angles on your body. In the top right corner, there is the counter, only good executions will be considered by it, in the top centre there is the timer,
which starts at the exact moment the user starts the first squat, and the feedback the app gives to the user during the exercise: good repetition, too fast, too slow, should go a bit more down or your backbone is not aligned.

Alt text

Wallsit: wallsit is a static exercise in which there are no repetitions but instead the athlete has to keep a precise position for a certain amount of time. The timer starts when the user is in the right position and in the end it will show how many seconds the user had a good position over the total second of workout selected by the user.

Alt text

In this screenshot, the user is keeping the right position and in the top left corner there's a green feedback that shows that everything is ok.

Alt text

In this other example, the user assumes a wrong position, in particular, he is going too low and the angle is too small, the system alerts the user showing red feedback and these seconds of execution will be labelled as wrong.

Workout: The workout is just a combination of the two above exercises, in which the user can choose to do both multiple times in a row without selecting it every time.

About

Multimodal interaction exam project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages