TextureSAM is a texture-aware variant of the Segment Anything Model (SAM), fine-tuned on the ADE20K dataset with texture augmentations. While SAM-2.1 relies on semantic cues, TextureSAM enhances segmentation in texture-dominant scenarios by introducing textures from the Describable Textures Dataset (DTD). Two versions are trained: one that preserves semantic structure and another that fully replaces objects with textures. Evaluated on RWTD (a natural texture-based segmentation dataset) and STMD (a synthetic texture-transition dataset), TextureSAM outperforms SAM-2.1 in capturing texture-based boundaries.
This code is forked and highly based on SAM2 repository by Meta.
1 Usage
2 Datasets
2.1 Kaust256 - Real-World Textures Dataset (RWTD)
2.2 ADE20K Dataset
2.3 Synthetic Textured Masks Dataset (STMD)
3 Checkpoints
4 Usage Instructions
5 Training
6 Inference
7 Evaluation
8 π₯TextureSAM vs. SAM-2: Segmentation Performance
- **STMD Results**
- **RWTD Results**
- **ADE20K Results**
9 Conclusions
Using this code, you can:
- Fine-tune the SAM2 model on the ADE20K dataset and other custom texture datasets.
- Run inference using the fine-tuned SAM2 model, called TextureSAM, to perform textural segmentation on the provided datasets.
- Evaluate the model on the RWTD, STMD, and ADE20K datasets.
The datasets used for the evaluation and inference of TextureSAM are available at the following link: TextureSAM Datasets and checkpoints.
The following datasets are included:
The RWTD dataset is a collection of real-world images specifically curated to evaluate segmentation models based on texture cues rather than object boundaries. Unlike conventional segmentation datasets that focus on semantic elements, RWTD emphasizes regions where segmentation is defined by texture changes. This dataset provides a realistic test case for models like TextureSAM, which are designed to mitigate the shape bias observed in SAM-2.
- Kaust256 contains 256 images.
- Textured images are also provided, generated using an ADE20K-based filtering process.
- Labels are binary segmentation masks provided in PNG format.
|-- Kaust256
|-- images
| |-- 101.jpg
| |-- 102.jpg
| |-- 103.jpg
| |-- ...
|-- labels
| |-- 101.png
| |-- 102.png
| |-- 103.png
| |-- ...
|-- images_textured
|-- 101.jpg
|-- 102.jpg
|-- 103.jpg
|-- ...
The ADE20K dataset is widely used for scene parsing and semantic segmentation. It includes over 20,000 images with dense annotations spanning 150 semantic categories. To facilitate texture-aware segmentation, multiple versions of ADE20K have been generated:
- ADE20K_0.3 : ADE20K dataset modified with texture augmentation (Ξ· β€ 0.3), provided in the SAM-compatible format. Each image is paired with a corresponding annotation JSON file, making it suitable for fine-tuning SAM to balance texture and semantic segmentation.
- ADE20K_real_images: The original ADE20K dataset without modifications. This directory contains real ADE20K images along with their corresponding ground truth (GT) instance masks. The dataset is organized into two main subdirectories:
- ADE20K_unified_gt_instances: Stores unified ground truth instance masks.
- images: Stores the original ADE20K images.
- gt_instances: Contains individual instance segmentation masks for each image.
- ADE20K_textured_images: Includes ADE20K images with all degrees of texture augmentation (Ξ· = 0 to 1) . These images are designed to decouple shape and texture by replacing object semantics with textures sampled from the Describable Textures Dataset (DTD).
The evaluation of TextureSAM was conducted using the ADE20K_real_images.
βββ ADE20K_0.3/
β βββ ADE_train_00000037_textured_degree0_0.jpg
β βββ ADE_train_00000037_textured_degree0_0.json
β βββ ADE_train_00000037_textured_degree0_1.jpg
β βββ ADE_train_00000037_textured_degree0_1.json
β βββ ADE_train_00000037_textured_degree0_2.jpg
β βββ ADE_train_00000037_textured_degree0_2.json
β βββ ADE_train_00000037_textured_degree0_3.jpg
β βββ ADE_train_00000037_textured_degree0_3.json
β βββ ADE_train_00000037.jpg
β βββ ADE_train_00000037.json
β βββ ...
ADE20K_real_images/
βββ ADE20K_unified_gt_instances/
β βββ unified_ADE_val_00000001.png
β βββ unified_ADE_val_00000002.png
β βββ unified_ADE_val_00000003.png
β βββ ...
βββ gt_instances/
β βββ instance_000_ADE_val_00000001.png
β βββ instance_001_ADE_val_00000001.png
β βββ instance_000_ADE_val_00000002.png
β βββ instance_001_ADE_val_00000002.png
β βββ ...
βββ images/
β βββ ADE_val_00000001.jpg
β βββ ADE_val_00000002.jpg
β βββ ADE_val_00000003.jpg
β βββ ...
The STMD dataset is a synthetic benchmark that evaluates segmentation models in a controlled texture-only environment. The Test Datset is available at the following link: Segmentation_dataset. Unlike RWTD, this dataset eliminates object boundaries and contains images composed purely of texture transitions. By isolating segmentation performance to texture cues alone, STMD provides a rigorous assessment of a model's ability to distinguish regions based solely on texture variations. The full STMD dataset is available on GitHub: STMD .
- Segmentation dataset contains 10,000 masks.
- Each mask has 5 images with random textures for each region, resulting in a total of 50,000 images.
- Training Set: 42,000 images.
- Validation Set: 3,000 images.
- Test Set: 5,000 images.
The Segmentation_dataset stored in Google Drive is organized as follows:
- images_test/ β Contains test images used for model evaluation.
- images_textured/ β Includes images processed with the ADE20K filter to enhance texture-based segmentation challenges.
- labels_test/ β Stores the segmentation masks corresponding to the test images, serving as ground truth for evaluation.
- test.txt β Lists the file paths of all test images, ensuring consistency in dataset loading.
- train.txt β Contains the file paths for training images.
- val.txt β Includes the file paths for validation images, used to monitor performance during training.
The evaluation of TextureSAM was conducted using the images_test/ and labels_test/ . The test.txt file was used to define the dataset split.
Segmentation_dataset/
βββ images_test/
β βββ 7.jpg
β βββ 17.jpg
β βββ 18.jpg
β βββ ...
βββ labels_test/
β βββ 7.png
β βββ 17.png
β βββ 18.png
β βββ ...
This repo provides three fine-tuned SAM-2.1 checkpoints, available in TextureSAM Datasets and checkpoints:
- sam2.1_hiera_small_0.3.pt β The best-performing TextureSAM model for texture segmentation, fine-tuned for 19 epochs with mild texture augmentation (Ξ· β€ 0.3), achieving the best balance between texture and semantic segmentation.
- sam2.1_hiera_small_1.pt β Fine-tuned for 25 epochs with strong texture augmentation (Ξ· β€ 1.0), optimized for texture-based segmentation.
- sam2.1_hiera_small.pt β Pre-trained SAM-2.1 Hiera Small, used as the base model before fine-tuning.
Checkpoints/
βββ sam2.1_hiera_small.pt
βββ sam2.1_hiera_small_0.3.pt
βββ sam2.1_hiera_small_1.pt
βββ download_ckpts.sh
To download the original SAM-2.1 model, use download_ckpts.sh.
- Set up the Environment for Running TextureSAM:
Use the conda configuration file created by TextureSAM environment setup, located at texturesam.yaml.
Set up the environment by running the following command:
conda env create --name sam -f texturesam.yaml
- Activate the sam environment:
conda activate texturesam
Requirements: We assume training is conducted on A100 GPUs with 80 GB of memory
- Download the SAM-2.1 Small Checkpoint
Run the following commands to download the checkpoint:
cd sam2/checkpoints && \ ./download_ckpts.sh && \ cd ..
- Download the dataset
- Get the ADE20K_0.3 dataset from the [TextureSAM Datasets and checkpoints](https://drive.google.com/drive/folders/1pUJLa898WYEcb4Y_sOaXsSVe- CsPkwRv?usp=drive_link).
- Set the paths for both ADE20K_0.3 dataset and SAM-2.1 Small checkpoint in sam2/sam2/configs/sam2.1_training/sam2.1_hiera_s_finetune.yaml.
- This dataset is already formatted for SAM 2 training.
- Fine-tune the model
To fine-tune the SAM-2.1 Small model on the ADE20K dataset as was done in the paper using 1 GPUs, run the following command:
python sam2/training/train.py \ -c sam2/sam2/configs/sam2.1_training/sam2.1_hiera_s_finetune.yaml \ --use-cluster 0 \ --num-gpus 1
- You can also train SAM 2.1 Huge or Base by modifying the configuration file accordingly.
- Training setup used in the paper: In the paper, we trained the model using a single A100 GPU.
- The training process follows the detailed steps in sam2/training, which is a fork of the original SAM 2 training instructions.
To run inference using TextureSAM or SAM on a dataset of your choice, use the sam2/inference.py script. This script processes input images and generates segmented output images along with their corresponding binary masks in a subfolder called masks_output.
-
Set Up Paths Before running the script, update the following paths in sam2/inference.py to match your dataset and model configuration:
- image_dir: Path to your input images.
- output_dir: Path where segmented images and masks will be saved.
- model_cfg: Configuration file for the model.
- checkpoint: Path to the model checkpoint.
-
Run the Inference Script:
python sam2/inference.py
-
In our implementation, we adjusted the parameters of the automatic mask generator to optimize segmentation results.
- Increased the number of sampling points to capture more detailed features.
- Set a higher IoU threshold to ensure high-confidence predictions.
- Adjusted stability and mask thresholds to control segmentation sensitivity.
- Configured the model to produce single binary masks per region for cleaner outputs.
- These changes enhance segmentation performance and were applied as described in the paper.
To evaluate the the performance of TextureSAM against SAM, we provide evaluation scripts for different datasets and evaluation settings. These scripts compute Mean Intersection over Union (mIoU) and, for non-aggregated masks, also use the Adjusted Rand Index (ARI) to measure segmentation quality.
eval_agg_masks.py
- Evaluates segmentation with aggregated masks for the RWTD and STMD datasets, merging overlapping predicted masks before computing mIoU, and matches predicted masks to ground truth labels based on pixel overlap for accurate evaluation.
- For RWTD, mIoU is averaged over both original and inverted labels due to binary label inconsistencies.
- Paths to set before running:
- Ground truth labels directory: label_dir = "./ground_truth_labels"
- Predicted masks pred_masks_dir = "./masks_output" (folder with predicted segmentations)
- Output results file: output_csv_path = "./evaluation_results.csv" (stores evaluation results)
- RWTD dataset flag: rwtd_dataset = False (set to True for RWTD dataset evaluation)
eval_agg_masks_ADE20K.py
- Evaluates segmentation with aggregated masks for the ADE20K dataset.
- Uses instance unification by merging predicted masks into a single segmentation mask before evaluation.
- Matches predicted masks to ground truth labels based on pixel overlap to improve evaluation accuracy.
- Paths to set before running:
- Unified ground truth labels: unified_labels_dir = "./unified_labels" (Directory containing ADE20K unified ground truth masks, stored in TextureSAM Datasets and Checkpoints under
ADE20k_real_images/ADE20K_unified_gt_instances
. - Predicted masks directory: pred_root_dir = "./pred_masks" (folder with predicted segmentation masks)
- Output results file: output_csv = "./evaluation_results.csv" (stores evaluation results)
- Unified ground truth labels: unified_labels_dir = "./unified_labels" (Directory containing ADE20K unified ground truth masks, stored in TextureSAM Datasets and Checkpoints under
- The script processes all images in the dataset, computes mIoU scores for each sample, and saves results in evaluation_results.csv.
eval_no_agg_masks.py
- Evaluates segmentation without aggregated masks, comparing each predicted mask separately.
- Computes both Mean IoU (mIoU) and Adjusted Rand Index (ARI) to assess segmentation quality.
- Penalizes over-segmentation by evaluating individual predicted masks instead of merging overlapping ones.
- Paths to set before running:
- Ground truth labels directory: gt_folder = "./path_to_ground_truth_masks"
- Predicted masks directory: pred_folder = "./masks_output" (Folder with predicted segmentation masks.)
- Output results file: output_csv = "./evaluation_results.csv" (Stores evaluation results.)
- The script processes all images, calculates mIoU and ARI per sample, and saves results in evaluation_results.csv.
eval_no_agg_masks_ADE20K.py
- Evaluates segmentation without aggregated masks for the ADE20K dataset.
- Computes both Mean IoU (mIoU) and Adjusted Rand Index (ARI) to measure segmentation quality.
- Assesses the impact of over-segmentation by evaluating individual predicted masks instead of merging overlapping ones.
- Paths to set before running:
- Unified ground truth labels: gt_folder = "./path_to_ground_truth_masks"(Directory containing ADE20K unified ground truth masks.)
- Predicted masks directory: pred_folder = "./masks_output" (Folder with predicted segmentation masks.)
- Output results file: output_csv = "./evaluation_results.csv" (Stores evaluation results.)
- The script processes all images in the dataset, computes mIoU and ARI per ground truth mask, and saves results in evaluation_results.csv.
What is SAM-2 and SAM-2?
- SAM-2: The original Segment Anything Model (SAM) variant, primarily relying on shape cues for segmentation, making it less effective in texture-based regions.
- SAM-2*: A modified version of SAM-2, using TextureSAMβs inference parameters to reduce shape bias and align its evaluation settings for a direct comparison.
STMD Results | mIoU | ARI | mIoU (Aggregated) |
---|---|---|---|
SAM-2 | 0.07 | 0.16 | 0.16 |
SAM-2* | 0.17 | 0.16 | 0.78 |
TextureSAM Ξ· β€ 0.3 | 0.33 | 0.32 | 0.71 |
TextureSAM Ξ· β€ 1 | 0.35 | 0.34 | 0.7 |
RWTD Results | mIoU | ARI | mIoU (Aggregated) |
---|---|---|---|
SAM-2 | 0.26 | 0.36 | 0.44 |
SAM-2* | 0.14 | 0.19 | 0.75 |
TextureSAM Ξ· β€ 0.3 | 0.47 | 0.62 | 0.75 |
TextureSAM Ξ· β€ 1 | 0.42 | 0.54 | 0.76 |
ADE20k Results | mIoU | ARI | mIoU (Aggregated) |
---|---|---|---|
SAM-2 | 0.08 | 0.08 | 0.46 |
SAM-2* | 0.1 | 0.11 | 0.65 |
TextureSAM Ξ· β€ 0.3 | 0.11 | 0.11 | 0.55 |
TextureSAM Ξ· β€ 1 | 0.12 | 0.11 | 0.40 |
- TextureSAM reduces SAM-2's shape bias, achieving superior segmentation in texture-based datasets (STMD & RWTD).
- Fine-tuning with texture augmentations shifts focus from shape to texture, balancing segmentation performance.