Skip to content
#

vision-encoder

Here are 9 public repositories matching this topic...

Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.

  • Updated Apr 23, 2025
  • Python

Multilabel-Portrait-SigLIP2 is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies portrait-style images into one of the following visual portrait categories:

  • Updated Apr 16, 2025
  • Python

shoe-type-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to detect different types of shoes such as Ballet Flats, Boat Shoes, Brogues, Clogs, and Sneakers. The model uses the SiglipForImageClassification architecture.

  • Updated Jun 7, 2025
  • Python

PussyCat-vs-Doggie-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either a cat or a dog using the SiglipForImageClassification architecture.

  • Updated Apr 19, 2025
  • Python

Flood-Image-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image contains a flooded scene or non-flooded environment. The model uses the SiglipForImageClassification architecture.

  • Updated May 27, 2025
  • Python

Improve this page

Add a description, image, and links to the vision-encoder topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-encoder topic, visit your repo's landing page and select "manage topics."

Learn more