Skip to content

πŸš€ feat(model): Add Dinomaly Model #2835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
25b1b42
Rebuilt again
rajeshgangireddy Jul 1, 2025
5c7355b
feat(ViTill): enhance model initialization and validation, improve fe…
rajeshgangireddy Jul 1, 2025
049c7c5
feat(Dinomaly): enhance model documentation and improve training/vali…
rajeshgangireddy Jul 1, 2025
d9ddea3
fix block's mem attention giving only one output in return
rajeshgangireddy Jul 7, 2025
cb85343
feat(Dinomaly): Working model. update model initialization and optimi…
rajeshgangireddy Jul 8, 2025
f77de9f
Refactor DINOv2 training code: remove deprecated training scripts and…
rajeshgangireddy Jul 9, 2025
b7760bf
feat(DINOmaly): Start cleaning up and adding doc strings
rajeshgangireddy Jul 9, 2025
4d2c62e
feat(Dinomaly): start adding doc strings
rajeshgangireddy Jul 9, 2025
a7990d9
feat(ModelLoader): simplify class design, improve API, and enhance er…
rajeshgangireddy Jul 10, 2025
fbfe346
refactor: remove model loader test script and improvement summary
rajeshgangireddy Jul 10, 2025
6684f85
feat(Dinomaly): add StableAdamW optimizer and WarmCosineScheduler cla…
rajeshgangireddy Jul 10, 2025
b5891ab
feat(Dinomaly): implement WarmCosineScheduler and refactor model load…
rajeshgangireddy Jul 10, 2025
1c4bfa8
Merge remote-tracking branch 'upstream/main' into dinomaly_workspace
rajeshgangireddy Jul 10, 2025
510802c
Refactor and optimize code across multiple modules
rajeshgangireddy Jul 10, 2025
a0003f6
docs: update README and module docstrings for Dinomaly model; improve…
rajeshgangireddy Jul 10, 2025
b9ac935
Remove files not used bu dinov2
rajeshgangireddy Jul 10, 2025
e442e1b
fix: update import paths for model components and adjust README table…
rajeshgangireddy Jul 10, 2025
1938628
refactor: remove xFormers dependency checks from attention and block …
rajeshgangireddy Jul 10, 2025
cc07edd
refactor: remove SwiGLUFFN and related xFormers logic from swiglu_ffn.py
rajeshgangireddy Jul 10, 2025
5c9c9b9
refactor: remove unused NestedTensorBlock and SwiGLUFFN imports from …
rajeshgangireddy Jul 10, 2025
d8212ec
refactor: clean up imports and remove unused code in dinov2 components
rajeshgangireddy Jul 10, 2025
600e8aa
feat: add utility functions for Dinomaly model and benchmark configur…
rajeshgangireddy Jul 11, 2025
69113ab
feat: implement DinomalyMLP class and update model loader for DINOv2 …
rajeshgangireddy Jul 11, 2025
f0482da
refactor: replace Mlp with DinomalyMLP in model layers and update ref…
rajeshgangireddy Jul 11, 2025
6aa9c24
feat: implement global cosine hard mining loss function and refactor …
rajeshgangireddy Jul 14, 2025
9ee0123
refactor: replace custom DropPath and LayerScale implementations with…
rajeshgangireddy Jul 14, 2025
1fbc37a
refactor: reorganize Dinomaly model components and update imports for…
rajeshgangireddy Jul 14, 2025
f95baf5
feat: add layer implementations and training utilities for Dinomaly m…
rajeshgangireddy Jul 14, 2025
af8511c
refactor: reorganize Dinomaly model components and update imports for…
rajeshgangireddy Jul 14, 2025
a3391b5
refactor: clean up code formatting and improve import organization ac…
rajeshgangireddy Jul 15, 2025
f45bfbe
refactor: improve readability by formatting parameters in patch embed…
rajeshgangireddy Jul 15, 2025
1af6c76
Remove workspace from Git tracking
rajeshgangireddy Jul 15, 2025
279699b
Refactor Dinomaly model components for improved type safety and error…
rajeshgangireddy Jul 15, 2025
8c24fc2
fix: update error message for sparse gradients in StableAdamW optimiz…
rajeshgangireddy Jul 15, 2025
254c2a5
feat: add training utilities and update Dinomaly model for enhanced l…
rajeshgangireddy Jul 16, 2025
5280841
refactor: standardize weight downloading process and improve cache di…
rajeshgangireddy Jul 16, 2025
b81b065
refactor: update image transformation methods and enhance training st…
rajeshgangireddy Jul 16, 2025
cdf8640
refactor: remove example usage from ViTill class docstrings for clarity
rajeshgangireddy Jul 17, 2025
87927d5
docs: enhance README.md with detailed architecture and key components…
rajeshgangireddy Jul 17, 2025
06882d2
Small refactor and minor improvements as per PR comments
rajeshgangireddy Jul 18, 2025
1e5246f
refactor: replace einsum operations with matrix operations for OpenVI…
rajeshgangireddy Jul 22, 2025
51cd329
ruff complains about commented code
rajeshgangireddy Jul 22, 2025
d1162e2
refactor: add comments for security mitigations in weight loading and…
rajeshgangireddy Jul 22, 2025
462ea07
add dinomaly entry in reference guide
rajeshgangireddy Jul 22, 2025
3eb4711
fix: make ruff/linters happy
rajeshgangireddy Jul 22, 2025
5786251
refactor: remove outdated note about DDPStrategy in Dinomaly class
rajeshgangireddy Jul 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions examples/configs/model/dinomaly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
model:
class_path: anomalib.models.Dinomaly
init_args:
encoder_name: dinov2reg_vit_base_14
bottleneck_dropout: 0.2
decoder_depth: 8

trainer:
max_steps: 5000
callbacks:
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
patience: 20
monitor: image_AUROC
mode: max
2 changes: 2 additions & 0 deletions src/anomalib/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
Csflow,
Dfkde,
Dfm,
Dinomaly,
Draem,
Dsr,
EfficientAd,
Expand Down Expand Up @@ -97,6 +98,7 @@ class UnknownModelError(ModuleNotFoundError):
"Dfkde",
"Dfm",
"Draem",
"Dinomaly",
"Dsr",
"EfficientAd",
"Fastflow",
Expand Down
2 changes: 2 additions & 0 deletions src/anomalib/models/image/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
from .csflow import Csflow
from .dfkde import Dfkde
from .dfm import Dfm
from .dinomaly import Dinomaly
from .draem import Draem
from .dsr import Dsr
from .efficient_ad import EfficientAd
Expand Down Expand Up @@ -84,4 +85,5 @@
"Uflow",
"VlmAd",
"WinClip",
"Dinomaly",
]
53 changes: 53 additions & 0 deletions src/anomalib/models/image/dinomaly/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Dinomaly: Vision Transformer-based Anomaly Detection with Feature Reconstruction

This is the implementation of the Dinomaly model based on the [original implementation](https://github.com/guojiajeremy/Dinomaly).

Model Type: Segmentation

## Description

Dinomaly is a Vision Transformer-based anomaly detection model that uses an encoder-decoder architecture for feature reconstruction. The model leverages pre-trained DINOv2 Vision Transformer features and employs a reconstruction-based approach to detect anomalies by comparing encoder and decoder features.

### Feature Extraction

Features are extracted from multiple intermediate layers of a pre-trained DINOv2 Vision Transformer encoder. The model typically uses features from layers 2-9 for base models, providing multi-scale feature representations that capture both low-level and high-level semantic information.

### Architecture

The Dinomaly model consists of three main components:

1. **DINOv2 Encoder**: Pre-trained Vision Transformer that extracts multi-layer features
2. **Bottleneck MLP**: Compresses the multi-layer features before reconstruction
3. **Vision Transformer Decoder**: Reconstructs the compressed features back to the original feature space

### Anomaly Detection

Anomaly detection is performed by computing cosine similarity between encoder and decoder features at multiple scales. The model generates anomaly maps by analyzing the reconstruction quality of features, where poor reconstruction indicates anomalous regions. Both anomaly detection (image-level) and localization (pixel-level) are supported.

## Usage

`anomalib train --model Dinomaly --data MVTecAD --data.category <category>`

## Benchmark

All results gathered with seed `42`.

## [MVTec AD Dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad)

### Image-Level AUC

| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor | Zipper |
| -------- | :-: | :----: | :--: | :-----: | :--: | :--: | :----: | :---: | :-----: | :------: | :-------: | :--: | :---: | :--------: | :--------: | :----: |
| Dinomaly | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |

### Pixel-Level AUC

| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor | Zipper |
| -------- | :-: | :----: | :--: | :-----: | :--: | :--: | :----: | :---: | :-----: | :------: | :-------: | :--: | :---: | :--------: | :--------: | :----: |
| Dinomaly | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |

### Image F1 Score

| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor | Zipper |
| -------- | :-: | :----: | :--: | :-----: | :--: | :--: | :----: | :---: | :-----: | :------: | :-------: | :--: | :---: | :--------: | :--------: | :----: |
| Dinomaly | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
38 changes: 38 additions & 0 deletions src/anomalib/models/image/dinomaly/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Dinomaly: Vision Transformer-based Anomaly Detection with Feature Reconstruction.

The Dinomaly model implements a Vision Transformer encoder-decoder architecture for
anomaly detection using pre-trained DINOv2 features. The model extracts features from
multiple intermediate layers of a DINOv2 encoder, compresses them through a bottleneck
MLP, and reconstructs them using a Vision Transformer decoder.

Anomaly detection is performed by computing cosine similarity between encoder and decoder
features at multiple scales. The model is particularly effective for visual anomaly
detection tasks where the goal is to identify regions or images that deviate from
normal patterns learned during training.

Example:
>>> from anomalib.models.image import Dinomaly
>>> model = Dinomaly()

The model can be used with any of the supported datasets and task modes in
anomalib. It leverages the powerful feature representations from DINOv2 Vision
Transformers combined with a reconstruction-based approach for robust anomaly detection.

Notes:
- Uses DINOv2 Vision Transformer as the backbone encoder
- Features are extracted from intermediate layers for multi-scale analysis
- Employs feature reconstruction loss for unsupervised learning
- Supports both anomaly detection and localization tasks
- Requires significant GPU memory due to Vision Transformer architecture

See Also:
:class:`anomalib.models.image.dinomaly.lightning_model.Dinomaly`:
Lightning implementation of the Dinomaly model.
"""

from anomalib.models.image.dinomaly.lightning_model import Dinomaly

__all__ = ["Dinomaly"]
50 changes: 50 additions & 0 deletions src/anomalib/models/image/dinomaly/components/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Components module for Dinomaly model.

This module provides all the necessary components for the Dinomaly Vision Transformer
architecture including layers, model loader, utilities, and vision transformer implementations.
"""

# Layer components
from .layers import (
Attention,
Block,
DinomalyMLP,
LinearAttention,
MemEffAttention,
)

# Model loader
from .model_loader import DinoV2Loader, load

# Utility functions and classes
from .training_utils import (
CosineHardMiningLoss,
StableAdamW,
WarmCosineScheduler,
)

# Vision transformer components
from .vision_transformer import (
DinoVisionTransformer,
)

__all__ = [
# Layers
"Attention",
"Block",
"DinomalyMLP",
"LinearAttention",
"MemEffAttention",
# Model loader
"DinoV2Loader",
"load",
# Utils
"StableAdamW",
"WarmCosineScheduler",
"CosineHardMiningLoss",
# Vision transformer
"DinoVisionTransformer",
]
Loading
Loading