CITS5206 Group 8 Project: Multimodal LLM Road Safety Platform

Project Overview

This project implements a Multimodal Large Language Model (LLM) Road Safety Platform using Streamlit and Google's Gemini 1.5 Flash AI model. The platform is designed to analyze road safety scenarios using advanced AI, providing insights and recommendations based on both textual and visual inputs.

Deployed Application

The Multimodal LLM Road Safety Platform is deployed and accessible online at:

https://llmroadsafetyplatform.streamlit.app/

You can use this link to access and use the application directly without any local setup.

Project Scope

The scope of this project includes:

Development of a user-friendly web interface using Streamlit.
Integration with Google's Gemini 1.5 Flash AI model.
Implementation of image processing capabilities to simulate various environmental conditions.
Analysis of road safety scenarios through text and image inputs.
Generation of AI-powered responses and recommendations for road safety improvements.
Bulk analysis of multiple images with customizable settings.

Group Members

UWA ID	Name	GitHub Username
23832048	Gnaneshwar Reddy Bana	gnaneshwarbana
23959947	Kanishk Kanishk	kanishk-uwa
23870387	Pedro Wang	CoderPdr
22941307	Sarath Pathari	AlteredOracle
23743373	Yuxin Gu	SoleilGU
23633858	Yuanfu Cao	Cyf1160819266

Features

Text and image input for analysis
Integration with Gemini 1.5 Flash and Gemini 1.5 Pro models
Image distortion options:
- Blur
- Brightness
- Contrast
- Sharpness
- Color (with saturation and hue shift)
- Rain effect
- Overlay (with custom image upload)
- Warp (with customizable wave and bulge effects)
Adjustable distortion intensity for each effect
Batch processing of multiple images
Bulk analysis with centralized or individual image settings
Support for folder path input for bulk analysis
Customizable system instructions for AI
Predefined and custom prompts for analysis
AI-generated responses and recommendations for road safety scenarios
Structured CSV output for analysis results

Technical Stack

Setup and Installation

Clone the repository:

git clone https://github.com/AlteredOracle/CITS5206.git
cd CITS5206

Set up a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```
Run the Streamlit app:
```
streamlit run src/app.py
```
Open a web browser and navigate to the URL provided by Streamlit (usually http://localhost:8501).

Running Tests

To run the tests for this project, execute the following command from the root directory of the project:

pytest tests/test_all.py

This command will run all the tests defined in the test_all.py file, which includes unit tests for various components of the application.

Usage

Enter your Gemini API key in the provided field when you start the app.
Choose between single image analysis or bulk analysis mode.
For single image analysis:
- Enter your text prompt or select a predefined one.
- Upload an image related to the scenario.
- Select and adjust image distortions if desired.
- Click "Analyse" to get the AI-generated response.
For bulk analysis:
- Choose to upload multiple files or specify a folder path.
- Set centralized distortion settings or customize for each image.
- Run the bulk analysis to process all images and generate a CSV report.

Sample Image for Testing

To test the application, you can use the following sample image:

Sample Road Safety Image

This image can be downloaded and used as input for the single image analysis or as part of a bulk analysis test.

Project Structure

Application Structure

The application stack for this project is illustrated in the following diagram:

This diagram outlines the key components and technologies used in our Multimodal LLM Road Safety Platform, including:

Frontend: Streamlit
Backend: Python
AI Model: Google Gemini 1.5 Flash
Image Processing: PIL (Python Imaging Library)
Data Handling: Pandas, NumPy

The structure showcases how user inputs are processed through our application, leveraging various libraries and the Gemini AI model to analyze road safety scenarios and provide insights.

Design Mockups

For design mockups and visual representations of the Multimodal LLM Road Safety Platform, please refer to the following link:

View Design on Figma

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
Project Documents		Project Documents
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CITS5206 Group 8 Project: Multimodal LLM Road Safety Platform

Project Overview

Deployed Application

Project Scope

Group Members

Features

Technical Stack

Setup and Installation

Running Tests

Usage

Sample Image for Testing

Project Structure

Application Structure

Design Mockups

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

AlteredOracle/CITS5206

Folders and files

Latest commit

History

Repository files navigation

CITS5206 Group 8 Project: Multimodal LLM Road Safety Platform

Project Overview

Deployed Application

Project Scope

Group Members

Features

Technical Stack

Setup and Installation

Running Tests

Usage

Sample Image for Testing

Project Structure

Application Structure

Design Mockups

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages