This project is a web application that allows users to search for images using text or audio queries. The application uses ChromaDB for image indexing and retrieval, and the BLIP model for generating image descriptions. Users can input their queries via text or microphone, and the application will return the most relevant images along with audio descriptions.
- Text Query: Users can input a text query to search for images.
- Audio Query: Users can use their microphone to input an audio query, which will be transcribed to text and used for the search.
- Image Descriptions: The application generates audio descriptions for each image result using the BLIP model.
- Real-time Audio Streaming: Uses
streamlit-webrtc
for real-time audio streaming from the microphone.
-
Clone the repository:
git clone https://github.com/Malalane/ImageQuery.git cd ImageQuery
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create environment variables in the root directory using the following export on the terminal:
export vectordb=your_vectordatabase_name
To download images for the dataset, follow these steps:
-
Go to the Kaggle competition page: Detect AI vs Human Generated Images
-
Download the dataset: Follow the instructions on the Kaggle page to download the dataset to your local machine.
-
Extract the dataset: Extract the downloaded dataset to a directory of your choice.
-
Install the Kaggle API:
pip install kaggle
-
Set up Kaggle API credentials:
- Go to your Kaggle account and create a new API token.
- Download the
kaggle.json
file and place it in the~/.kaggle/
directory (create the directory if it doesn't exist).
-
Download the dataset using the Kaggle API:
kaggle competitions download -c detect-ai-vs-human-generated-images
-
Extract the dataset:
unzip detect-ai-vs-human-generated-images.zip -d /path/to/dataset_folder
To create a vector database from the downloaded dataset of images, use the create_vectordb.py
script. This script takes two arguments: the path to the dataset folder and the name of the vector database.
-
Run the script:
python create_vectordb.py /path/to/dataset_folder my_vectordb_name
Replace
/path/to/dataset_folder
with the actual path to your dataset folder andmy_vectordb_name
with the desired name for your ChromaDB vector database.
-
Run the Streamlit application:
streamlit run streamlit_app.py
-
Open your web browser and go to
http://localhost:8501
. -
Enter your query:
- Text Query: Enter your query text in the input box.
- Audio Query: Click on the microphone button to start recording your query.
-
View Results: The application will display the most relevant images along with audio descriptions.
-
Install Rosetta on macOS: If you are using macOS, you will need to install Rosetta to get audio to work on Streamlit:
softwareupdate --install-rosetta
project/
│
├── vectordatabase/
│
├── create_vectordb.py
├── streamlit_app.py
├── requirements.txt
└── README.md
streamlit
torch
torchvision
Pillow
numpy
chromadb
SpeechRecognition
gtts
streamlit-webrtc
transformers
keybert
This project is licensed under the MIT License. See the LICENSE file for details.