This project implements a Multimodal Large Language Model (LLM) Road Safety Platform using Streamlit and Google's Gemini 1.5 Flash AI model. The platform is designed to analyze road safety scenarios using advanced AI, providing insights and recommendations based on both textual and visual inputs.
The Multimodal LLM Road Safety Platform is deployed and accessible online at:
https://llmroadsafetyplatform.streamlit.app/
You can use this link to access and use the application directly without any local setup.
The scope of this project includes:
- Development of a user-friendly web interface using Streamlit.
- Integration with Google's Gemini 1.5 Flash AI model.
- Implementation of image processing capabilities to simulate various environmental conditions.
- Analysis of road safety scenarios through text and image inputs.
- Generation of AI-powered responses and recommendations for road safety improvements.
- Bulk analysis of multiple images with customizable settings.
UWA ID | Name | GitHub Username |
---|---|---|
23832048 | Gnaneshwar Reddy Bana | gnaneshwarbana |
23959947 | Kanishk Kanishk | kanishk-uwa |
23870387 | Pedro Wang | CoderPdr |
22941307 | Sarath Pathari | AlteredOracle |
23743373 | Yuxin Gu | SoleilGU |
23633858 | Yuanfu Cao | Cyf1160819266 |
- Text and image input for analysis
- Integration with Gemini 1.5 Flash and Gemini 1.5 Pro models
- Image distortion options:
- Blur
- Brightness
- Contrast
- Sharpness
- Color (with saturation and hue shift)
- Rain effect
- Overlay (with custom image upload)
- Warp (with customizable wave and bulge effects)
- Adjustable distortion intensity for each effect
- Batch processing of multiple images
- Bulk analysis with centralized or individual image settings
- Support for folder path input for bulk analysis
- Customizable system instructions for AI
- Predefined and custom prompts for analysis
- AI-generated responses and recommendations for road safety scenarios
- Structured CSV output for analysis results
-
Clone the repository:
git clone https://github.com/AlteredOracle/CITS5206.git cd CITS5206
-
Set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run src/app.py
-
Open a web browser and navigate to the URL provided by Streamlit (usually
http://localhost:8501
).
To run the tests for this project, execute the following command from the root directory of the project:
pytest tests/test_all.py
This command will run all the tests defined in the test_all.py
file, which includes unit tests for various components of the application.
- Enter your Gemini API key in the provided field when you start the app.
- Choose between single image analysis or bulk analysis mode.
- For single image analysis:
- Enter your text prompt or select a predefined one.
- Upload an image related to the scenario.
- Select and adjust image distortions if desired.
- Click "Analyse" to get the AI-generated response.
- For bulk analysis:
- Choose to upload multiple files or specify a folder path.
- Set centralized distortion settings or customize for each image.
- Run the bulk analysis to process all images and generate a CSV report.
To test the application, you can use the following sample image:
This image can be downloaded and used as input for the single image analysis or as part of a bulk analysis test.
The application stack for this project is illustrated in the following diagram:
This diagram outlines the key components and technologies used in our Multimodal LLM Road Safety Platform, including:
- Frontend: Streamlit
- Backend: Python
- AI Model: Google Gemini 1.5 Flash
- Image Processing: PIL (Python Imaging Library)
- Data Handling: Pandas, NumPy
The structure showcases how user inputs are processed through our application, leveraging various libraries and the Gemini AI model to analyze road safety scenarios and provide insights.
For design mockups and visual representations of the Multimodal LLM Road Safety Platform, please refer to the following link: