vision-encoder
Here are 9 public repositories matching this topic...
Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.
-
Updated
Apr 23, 2025 - Python
Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.
-
Updated
Apr 18, 2025 - Python
Multilabel-Portrait-SigLIP2 is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies portrait-style images into one of the following visual portrait categories:
-
Updated
Apr 16, 2025 - Python
Coral-Health is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify coral reef images into two health conditions using the SiglipForImageClassification architecture.
-
Updated
Apr 28, 2025 - Python
shoe-type-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to detect different types of shoes such as Ballet Flats, Boat Shoes, Brogues, Clogs, and Sneakers. The model uses the SiglipForImageClassification architecture.
-
Updated
Jun 7, 2025 - Python
PussyCat-vs-Doggie-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either a cat or a dog using the SiglipForImageClassification architecture.
-
Updated
Apr 19, 2025 - Python
Flood-Image-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image contains a flooded scene or non-flooded environment. The model uses the SiglipForImageClassification architecture.
-
Updated
May 27, 2025 - Python
Leverage SigLIP 2's capabilities using LitServe.
-
Updated
Feb 28, 2025 - Python
Improve this page
Add a description, image, and links to the vision-encoder topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the vision-encoder topic, visit your repo's landing page and select "manage topics."