Image Recommender System

Welcome to the Image Recommender System, a robust platform for generating image recommendations based on visual similarity. In short, you can:

Generate image metadata and load it into a SQL database.
Extract image features such as neural network embeddings, RGB color histograms, and hashes (ahashes and dhashes).
Input an image and retrieve the top-k closest images from your database using similarity metrics like cosine similarity, Euclidean distance, and Hamming distance.
Optionally, reduce dimensions and visualize the image distribution using TensorBoard or custom plots.

Example similarity search outputs:

Resnet18 embeddings: Rgb color histograms: Average hashes:

How It Works

The Image Recommender System consists of a modular pipeline with the following key components:

Metadata Generation & Database Storage
- Image Metadata Extraction: The system scans your dataset to extract metadata (e.g., image IDs, file paths) and prepares it for database insertion.
- SQL Database Integration: The extracted metadata is loaded into a SQL database for efficient querying and management.
Feature Extraction
- Neural Network Embeddings:
  Uses pre-trained models (e.g., ResNet18) to extract high-dimensional feature vectors that capture semantic content.
- RGB Color Histograms:
  Computes histograms to represent the dominant color distributions in images.
- Hash-Based Features:
  Generates hashes (ahashes and dhashes) that capture structural and textural information.
Similarity Computation & Retrieval
- Distance Metrics:
  The system supports various metrics, including cosine similarity, Euclidean distance, Hamming distance, and Manhattan distance.
- Top-K Retrieval:
  Based on the selected similarity metric and mode, the system retrieves the top-k most similar images from the database.
Visualization & Exploration
- Jupyter Notebook Interface:
  An interactive notebook (show_similarites.ipynb) allows you to input images, run similarity queries, and display the results.
- TensorBoard & Dimensionality Reduction:
  You can also reduce the feature dimensions (e.g., using UMAP) and visualize the embeddings with TensorBoard for an intuitive exploration of the dataset.

For more details, please refer to the Documentation.

Installation

To get started, follow these steps:

Clone the repository:

git clone https://github.com/honnigmelone/image_recommender.git

Navigate to the project directory

cd image_recommender/

Install the dependencies

pip install -r requirements.txt

How to use

After setting up the repository, the pipeline runs in several stages:

Configure Your Dataset

Open the configuration file (config.py) and update the path to YOUR image dataset.
Other paths typically require no changes.

Run the Generator and create the database, Run follwoing commands from directory root

python src/generator.py

Execute the main loop to extract feature data Execute the main pipeline to extract image features. (This step may take a while, depending on your dataset size.)

python src/main.py

View Similarity Results
Open and run the Jupyter Notebook show_similarites.ipynb:

Add the paths of your input images to the provided list.
The notebook calls the calculate_similarites() function from similarites.py, which uses the following parameters:
- input_images: A list of image paths.
- cursor: A database cursor for efficient data retrieval.
- mode: Similarity mode (choose from: embeddings, rgb, ahashes, dhashes).
- metric: Similarity metric (options include cosine, euclidean, hamming, manhattan).
- top_k: Number of similar images to retrieve (default is 5).
- verbose: Set to True to display execution times (default is False).

Visualisation

You can visualize the images either in a Tensorboard or with Dimensionreduction

Tensorboard

Prepare Visualization Data
Generate the metadata file, sprite image, and checkpoint data. You can specify the mode in the main function like "embeddings" or "rgb_hists":

python src/tensorboard_preparation.py

Launch TensorBoard
Run the following command to start TensorBoard on your localhost:

tensorboard --logdir logs/

Example visualisation using color similarities with UMAP(UMAP only takes first 5000 entries)

UMAP

Just open the jupyter Notebook umap_analysis.ipynb and play with the functions :) You can specify the mode, and top_k which represents the cluster values Please be aware that umap needs a lot of memory to run on large datasets You can either reduce dimensions and cluster afterwards or the other way around depending on your use case!

Need Help?

If you have any questions, encounter issues, or need assistance, please open an issue in the GitHub repository. We’re here to help and value your feedback!

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
data		data
logs		logs
src		src
tests		tests
.gitignore		.gitignore
Big_Data_Image_Recommender_Doku.pdf		Big_Data_Image_Recommender_Doku.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Recommender System

Table of Contents

How It Works

Installation

How to use

Visualisation

Tensorboard

UMAP

Need Help?

License

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

honnigmelone/image_recommender

Folders and files

Latest commit

History

Repository files navigation

Image Recommender System

Table of Contents

How It Works

Installation

How to use

Visualisation

Tensorboard

UMAP

Need Help?

License

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages