AI tool to assist visually impaired people. It takes a voice prompt and an image and generates an audio description of the image considering the user prompt
Our project aims to provide an assistive technology tool to support visually impaired people, providing them with a description of the scene they are in. This project would integrate image description, speech-to-text, and voice synthesis models. Provided an image and a voice prompt, the model generates a description considering the user's prompt, and it outputs the description as audio. The goal of our technology is to use deep learning techniques in order to improve daily life quality and increase the autonomy of sight-impaired people.
At the moment our tool works as follows:
- The webcam starts working and by pressing the SPACE bar you can take a picture
- Then, you can orally tell the prompt which contains your request for the description
- The tool formulates a description
- The description is provided as an audio