This project is a collaborative graduation design by Ruijie Ma(Myself), Zongqi He, and Yuan Zhang. It aims to explore basic multimodal interaction using camera-based color recognition and sound-to-image generation. The implementation relies on relevant open-source tools and AI models. The overall design and implementation are still being continuously improved, and feedback from teachers and fellow students is welcome.
- Color Recognition: Uses a webcam to detect specified colors in real-time, serving as input or triggers for subsequent processes.
- Sound-to-Image Generation: Adopts the training approach from the Soundscape-to-Image project to convert sound signals into images. The actual image generation uses the GPTImage1 model API.
- **The GPTImage1 API supports inpainting (partial redraw) based on a mask + prompt, enabling precise transformation and region-specific editing in the generated visuals. https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
git clone https://github.com/RuijieThranduil/Sensing-double.git
-
Clone the Sound-to-Image Dependency Project
Please first clone the Soundscape-to-Image repository and follow its documentation to train the model or prepare the necessary weights.git clone https://github.com/GISense/Soundscape-to-Image.git
-
Integrate into Unity Project
Add the relevant code and model weights from Soundscape-to-Image into your Unity project's directories (such asAssets
). This can serve as the template module for sound-to-image generation. -
Configure GPTImage1 Image Generation API
Configure the GPTImage1 model according to the API documentation for converting sound signals into images: https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
In the GameManagement script, you can customize the colors recognized by the camera and their corresponding prompts. Flexible configuration is supported for future extensions.
After launching the project, simply follow the step-by-step instructions on the UI to complete configuration, recognition, and image generation.
- Unity 2021 or above
- Soundscape-to-Image
- GPTImage1 Image Generation API
Sensing-double/
├── Assets/
├── GameManagement/ # Color recognition and prompt configuration
├── Soundscape-to-Image/ # Needs to be cloned separately
├── README.md
└── ...
Special thanks to the Technical University of Munich (TUM), Architecture Information Chair, Nick Foester, Ivan, and Professor Frank Petzold for their support and guidance on this project.
Thanks also to the open-source community and related projects for their technical support, as well as to teachers for their guidance. Suggestions and feedback are welcome.
Please see the LICENSE file in this repository.