A big data project using deep learning method to output a recommended image based on the similarity of characteristics
Since images are large vectors with 3 colour channels, it requires high computational resources to calculate the mathematical operations such as Euclidean or Cosine distance to calculate similarity. Hence, we need to downscale its complexity but still retain the information of it, so that we can still measure the similarity of the images in their simplified form.
- Preprocess the image using opencv to get the colour histogram of every pixel in the images.
- Use a transfer learning model such as MobileNet to get the image embeddings.
- Save the information including the image path in SQLite database and .csv file as a backup.
- After all the images are preprocessed, the input image can be given so that it will output 5 similar images from the database in the ImageRecommender function. The similarity is calculated using Euclidean distance.
- Used a dimension reductional technique such as t-SNE or UMAP to visualize high dimensional data to see the clustering patterns between all images.
- Images are unstructured and complex data. Hence, necessary steps and techniques to downscale its complexity.
- Specific tenchniques such as using a generator, batch processing and implementation of a database improve the computational resource and make the whole process efficient.