Skip to content

πŸš€ feat(model): Add Dinomaly Model #2835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
25b1b42
Rebuilt again
rajeshgangireddy Jul 1, 2025
5c7355b
feat(ViTill): enhance model initialization and validation, improve fe…
rajeshgangireddy Jul 1, 2025
049c7c5
feat(Dinomaly): enhance model documentation and improve training/vali…
rajeshgangireddy Jul 1, 2025
d9ddea3
fix block's mem attention giving only one output in return
rajeshgangireddy Jul 7, 2025
cb85343
feat(Dinomaly): Working model. update model initialization and optimi…
rajeshgangireddy Jul 8, 2025
f77de9f
Refactor DINOv2 training code: remove deprecated training scripts and…
rajeshgangireddy Jul 9, 2025
b7760bf
feat(DINOmaly): Start cleaning up and adding doc strings
rajeshgangireddy Jul 9, 2025
4d2c62e
feat(Dinomaly): start adding doc strings
rajeshgangireddy Jul 9, 2025
a7990d9
feat(ModelLoader): simplify class design, improve API, and enhance er…
rajeshgangireddy Jul 10, 2025
fbfe346
refactor: remove model loader test script and improvement summary
rajeshgangireddy Jul 10, 2025
6684f85
feat(Dinomaly): add StableAdamW optimizer and WarmCosineScheduler cla…
rajeshgangireddy Jul 10, 2025
b5891ab
feat(Dinomaly): implement WarmCosineScheduler and refactor model load…
rajeshgangireddy Jul 10, 2025
1c4bfa8
Merge remote-tracking branch 'upstream/main' into dinomaly_workspace
rajeshgangireddy Jul 10, 2025
510802c
Refactor and optimize code across multiple modules
rajeshgangireddy Jul 10, 2025
a0003f6
docs: update README and module docstrings for Dinomaly model; improve…
rajeshgangireddy Jul 10, 2025
b9ac935
Remove files not used bu dinov2
rajeshgangireddy Jul 10, 2025
e442e1b
fix: update import paths for model components and adjust README table…
rajeshgangireddy Jul 10, 2025
1938628
refactor: remove xFormers dependency checks from attention and block …
rajeshgangireddy Jul 10, 2025
cc07edd
refactor: remove SwiGLUFFN and related xFormers logic from swiglu_ffn.py
rajeshgangireddy Jul 10, 2025
5c9c9b9
refactor: remove unused NestedTensorBlock and SwiGLUFFN imports from …
rajeshgangireddy Jul 10, 2025
d8212ec
refactor: clean up imports and remove unused code in dinov2 components
rajeshgangireddy Jul 10, 2025
600e8aa
feat: add utility functions for Dinomaly model and benchmark configur…
rajeshgangireddy Jul 11, 2025
69113ab
feat: implement DinomalyMLP class and update model loader for DINOv2 …
rajeshgangireddy Jul 11, 2025
f0482da
refactor: replace Mlp with DinomalyMLP in model layers and update ref…
rajeshgangireddy Jul 11, 2025
6aa9c24
feat: implement global cosine hard mining loss function and refactor …
rajeshgangireddy Jul 14, 2025
9ee0123
refactor: replace custom DropPath and LayerScale implementations with…
rajeshgangireddy Jul 14, 2025
1fbc37a
refactor: reorganize Dinomaly model components and update imports for…
rajeshgangireddy Jul 14, 2025
f95baf5
feat: add layer implementations and training utilities for Dinomaly m…
rajeshgangireddy Jul 14, 2025
af8511c
refactor: reorganize Dinomaly model components and update imports for…
rajeshgangireddy Jul 14, 2025
a3391b5
refactor: clean up code formatting and improve import organization ac…
rajeshgangireddy Jul 15, 2025
f45bfbe
refactor: improve readability by formatting parameters in patch embed…
rajeshgangireddy Jul 15, 2025
1af6c76
Remove workspace from Git tracking
rajeshgangireddy Jul 15, 2025
279699b
Refactor Dinomaly model components for improved type safety and error…
rajeshgangireddy Jul 15, 2025
8c24fc2
fix: update error message for sparse gradients in StableAdamW optimiz…
rajeshgangireddy Jul 15, 2025
254c2a5
feat: add training utilities and update Dinomaly model for enhanced l…
rajeshgangireddy Jul 16, 2025
5280841
refactor: standardize weight downloading process and improve cache di…
rajeshgangireddy Jul 16, 2025
b81b065
refactor: update image transformation methods and enhance training st…
rajeshgangireddy Jul 16, 2025
cdf8640
refactor: remove example usage from ViTill class docstrings for clarity
rajeshgangireddy Jul 17, 2025
87927d5
docs: enhance README.md with detailed architecture and key components…
rajeshgangireddy Jul 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions examples/configs/model/dinomaly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
model:
class_path: anomalib.models.Dinomaly
init_args:
encoder_name: dinov2reg_vit_base_14
bottleneck_dropout: 0.2
decoder_depth: 8

trainer:
max_steps: 5000
callbacks:
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
patience: 20
monitor: image_AUROC
mode: max
2 changes: 2 additions & 0 deletions src/anomalib/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
Csflow,
Dfkde,
Dfm,
Dinomaly,
Draem,
Dsr,
EfficientAd,
Expand Down Expand Up @@ -97,6 +98,7 @@ class UnknownModelError(ModuleNotFoundError):
"Dfkde",
"Dfm",
"Draem",
"Dinomaly",
"Dsr",
"EfficientAd",
"Fastflow",
Expand Down
2 changes: 2 additions & 0 deletions src/anomalib/models/image/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
from .csflow import Csflow
from .dfkde import Dfkde
from .dfm import Dfm
from .dinomaly import Dinomaly
from .draem import Draem
from .dsr import Dsr
from .efficient_ad import EfficientAd
Expand Down Expand Up @@ -84,4 +85,5 @@
"Uflow",
"VlmAd",
"WinClip",
"Dinomaly",
]
81 changes: 81 additions & 0 deletions src/anomalib/models/image/dinomaly/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Dinomaly: Vision Transformer-based Anomaly Detection with Feature Reconstruction

This is the implementation of the Dinomaly model based on the [original implementation](https://github.com/guojiajeremy/Dinomaly).

Model Type: Segmentation

## Description

Dinomaly is a Vision Transformer-based anomaly detection model that uses an encoder-decoder architecture
for feature reconstruction.
The model leverages pre-trained DINOv2 Vision Transformer features and employs a reconstruction-based approach
to detect anomalies by comparing encoder and decoder features.

### Architecture

The Dinomaly model consists of three main components:

1. DINOv2 Encoder: A pre-trained Vision Transformer (ViT) which extracts multi-scale feature maps.
2. Bottleneck MLP: A simple feed-forward network that collects features from the encoder's middle layers
(e.g., 8 out of 12 layers for ViT-Base).
3. Vision Transformer Decoder: Consisting of Transformer layers (typically 8), it learns to reconstruct the
compressed middle-level features by maximising cosine similarity with the encoder's features.

Only the parameters of the bottleneck MLP and the decoder are trained.

#### Key Components

1. Foundation Transformer Models: Dinomaly leverages pre-trained ViTs (like DinoV2) which provide universal and
discriminative features. This use of foundation models enables strong performance across various image patterns.
2. Noisy Bottleneck: This component activates built-in Dropout within the MLP bottleneck.
By randomly discarding neural activations, Dropout acts as a "pseudo feature anomaly," which forces the decoder
to restore only normal features. This helps prevent the decoder from becoming too adept at reconstructing
anomalous patterns it has not been specifically trained on.
3. Linear Attention: Instead of traditional Softmax Attention, Linear Attention is used in the decoder.
Linear Attention's inherent inability to heavily focus on local regions, a characteristic sometimes seen as a
"side effect" in supervised tasks, is exploited here. This property encourages attention to spread across
the entire image, reducing the likelihood of the decoder simply forwarding identical information
from unexpected or anomalous patterns. This also contributes to computational efficiency.
4. Loose Reconstruction:
1. Loose Constraint: Rather than enforcing rigid layer-to-layer reconstruction, Dinomaly groups multiple
encoder layers as a whole for reconstruction (e.g., into low-semantic and high-semantic groups).
This provides the decoder with more degrees of freedom, allowing it to behave more distinctly from the
encoder when encountering unseen patterns.
2. Loose Loss: The point-by-point reconstruction loss function is loosened by employing a hard-mining
global cosine loss. This approach detaches the gradients of feature points that are already well-reconstructed
during training, preventing the model from becoming overly proficient at reconstructing all features,
including those that might correspond to anomalies.

### Anomaly Detection

Anomaly detection is performed by computing cosine similarity between encoder and decoder features at multiple scales.
The model generates anomaly maps by analyzing the reconstruction quality of features, where poor reconstruction
indicates anomalous regions. Both anomaly detection (image-level) and localization (pixel-level) are supported.

## Usage

`anomalib train --model Dinomaly --data MVTecAD --data.category <category>`

## Benchmark

All results gathered with seed `42`. The `max_steps` parameter is set to `5000` for training.

## [MVTec AD Dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad)

### Image-Level AUC

| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor |
| -------- | :---: | :----: | :---: | :-----: | :---: | :---: | :----: | :---: | :-----: | :------: | :-------: | :---: | :---: | :--------: | :--------: |
| Dinomaly | 0.995 | 0.998 | 0.999 | 1.000 | 1.000 | 0.993 | 1.000 | 1.000 | 0.988 | 1.000 | 1.000 | 0.993 | 0.985 | 1.000 | 0.997 |

### Pixel-Level AUC

| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor |
| -------- | :---: | :----: | :---: | :-----: | :---: | :---: | :----: | :---: | :-----: | :------: | :-------: | :---: | :---: | :--------: | :--------: |
| Dinomaly | 0.981 | 0.993 | 0.993 | 0.993 | 0.975 | 0.975 | 0.990 | 0.981 | 0.986 | 0.994 | 0.969 | 0.977 | 0.997 | 0.988 | 0.950 |

### Image F1 Score

| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor |
| -------- | :---: | :----: | :---: | :-----: | :---: | :---: | :----: | :---: | :-----: | :------: | :-------: | :---: | :---: | :--------: | :--------: |
| Dinomaly | 0.985 | 0.983 | 0.991 | 0.995 | 0.994 | 0.975 | 1.000 | 0.995 | 0.982 | 1.000 | 1.000 | 0.986 | 0.957 | 0.983 | 0.976 |
38 changes: 38 additions & 0 deletions src/anomalib/models/image/dinomaly/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Dinomaly: Vision Transformer-based Anomaly Detection with Feature Reconstruction.

The Dinomaly model implements a Vision Transformer encoder-decoder architecture for
anomaly detection using pre-trained DINOv2 features. The model extracts features from
multiple intermediate layers of a DINOv2 encoder, compresses them through a bottleneck
MLP, and reconstructs them using a Vision Transformer decoder.

Anomaly detection is performed by computing cosine similarity between encoder and decoder
features at multiple scales. The model is particularly effective for visual anomaly
detection tasks where the goal is to identify regions or images that deviate from
normal patterns learned during training.

Example:
>>> from anomalib.models.image import Dinomaly
>>> model = Dinomaly()

The model can be used with any of the supported datasets and task modes in
anomalib. It leverages the powerful feature representations from DINOv2 Vision
Transformers combined with a reconstruction-based approach for robust anomaly detection.

Notes:
- Uses DINOv2 Vision Transformer as the backbone encoder
- Features are extracted from intermediate layers for multi-scale analysis
- Employs feature reconstruction loss for unsupervised learning
- Supports both anomaly detection and localization tasks
- Requires significant GPU memory due to Vision Transformer architecture

See Also:
:class:`anomalib.models.image.dinomaly.lightning_model.Dinomaly`:
Lightning implementation of the Dinomaly model.
"""

from anomalib.models.image.dinomaly.lightning_model import Dinomaly

__all__ = ["Dinomaly"]
50 changes: 50 additions & 0 deletions src/anomalib/models/image/dinomaly/components/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Components module for Dinomaly model.

This module provides all the necessary components for the Dinomaly Vision Transformer
architecture including layers, model loader, utilities, and vision transformer implementations.
"""

# Layer components
from .layers import (
Attention,
Block,
DinomalyMLP,
LinearAttention,
MemEffAttention,
)

# Model loader
from .model_loader import DinoV2Loader, load

# Utility functions and classes
from .training_utils import (
CosineHardMiningLoss,
StableAdamW,
WarmCosineScheduler,
)

# Vision transformer components
from .vision_transformer import (
DinoVisionTransformer,
)

__all__ = [
# Layers
"Attention",
"Block",
"DinomalyMLP",
"LinearAttention",
"MemEffAttention",
# Model loader
"DinoV2Loader",
"load",
# Utils
"StableAdamW",
"WarmCosineScheduler",
"CosineHardMiningLoss",
# Vision transformer
"DinoVisionTransformer",
]
Loading
Loading