-
Notifications
You must be signed in to change notification settings - Fork 767
π feat(model): Add Dinomaly Model #2835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rajeshgangireddy
wants to merge
39
commits into
open-edge-platform:main
Choose a base branch
from
rajeshgangireddy:dinomaly_workspace
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,605
β0
Open
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
25b1b42
Rebuilt again
rajeshgangireddy 5c7355b
feat(ViTill): enhance model initialization and validation, improve feβ¦
rajeshgangireddy 049c7c5
feat(Dinomaly): enhance model documentation and improve training/valiβ¦
rajeshgangireddy d9ddea3
fix block's mem attention giving only one output in return
rajeshgangireddy cb85343
feat(Dinomaly): Working model. update model initialization and optimiβ¦
rajeshgangireddy f77de9f
Refactor DINOv2 training code: remove deprecated training scripts andβ¦
rajeshgangireddy b7760bf
feat(DINOmaly): Start cleaning up and adding doc strings
rajeshgangireddy 4d2c62e
feat(Dinomaly): start adding doc strings
rajeshgangireddy a7990d9
feat(ModelLoader): simplify class design, improve API, and enhance erβ¦
rajeshgangireddy fbfe346
refactor: remove model loader test script and improvement summary
rajeshgangireddy 6684f85
feat(Dinomaly): add StableAdamW optimizer and WarmCosineScheduler claβ¦
rajeshgangireddy b5891ab
feat(Dinomaly): implement WarmCosineScheduler and refactor model loadβ¦
rajeshgangireddy 1c4bfa8
Merge remote-tracking branch 'upstream/main' into dinomaly_workspace
rajeshgangireddy 510802c
Refactor and optimize code across multiple modules
rajeshgangireddy a0003f6
docs: update README and module docstrings for Dinomaly model; improveβ¦
rajeshgangireddy b9ac935
Remove files not used bu dinov2
rajeshgangireddy e442e1b
fix: update import paths for model components and adjust README tableβ¦
rajeshgangireddy 1938628
refactor: remove xFormers dependency checks from attention and block β¦
rajeshgangireddy cc07edd
refactor: remove SwiGLUFFN and related xFormers logic from swiglu_ffn.py
rajeshgangireddy 5c9c9b9
refactor: remove unused NestedTensorBlock and SwiGLUFFN imports from β¦
rajeshgangireddy d8212ec
refactor: clean up imports and remove unused code in dinov2 components
rajeshgangireddy 600e8aa
feat: add utility functions for Dinomaly model and benchmark configurβ¦
rajeshgangireddy 69113ab
feat: implement DinomalyMLP class and update model loader for DINOv2 β¦
rajeshgangireddy f0482da
refactor: replace Mlp with DinomalyMLP in model layers and update refβ¦
rajeshgangireddy 6aa9c24
feat: implement global cosine hard mining loss function and refactor β¦
rajeshgangireddy 9ee0123
refactor: replace custom DropPath and LayerScale implementations withβ¦
rajeshgangireddy 1fbc37a
refactor: reorganize Dinomaly model components and update imports forβ¦
rajeshgangireddy f95baf5
feat: add layer implementations and training utilities for Dinomaly mβ¦
rajeshgangireddy af8511c
refactor: reorganize Dinomaly model components and update imports forβ¦
rajeshgangireddy a3391b5
refactor: clean up code formatting and improve import organization acβ¦
rajeshgangireddy f45bfbe
refactor: improve readability by formatting parameters in patch embedβ¦
rajeshgangireddy 1af6c76
Remove workspace from Git tracking
rajeshgangireddy 279699b
Refactor Dinomaly model components for improved type safety and errorβ¦
rajeshgangireddy 8c24fc2
fix: update error message for sparse gradients in StableAdamW optimizβ¦
rajeshgangireddy 254c2a5
feat: add training utilities and update Dinomaly model for enhanced lβ¦
rajeshgangireddy 5280841
refactor: standardize weight downloading process and improve cache diβ¦
rajeshgangireddy b81b065
refactor: update image transformation methods and enhance training stβ¦
rajeshgangireddy cdf8640
refactor: remove example usage from ViTill class docstrings for clarity
rajeshgangireddy 87927d5
docs: enhance README.md with detailed architecture and key componentsβ¦
rajeshgangireddy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
model: | ||
class_path: anomalib.models.Dinomaly | ||
init_args: | ||
encoder_name: dinov2reg_vit_base_14 | ||
bottleneck_dropout: 0.2 | ||
decoder_depth: 8 | ||
|
||
trainer: | ||
max_steps: 5000 | ||
callbacks: | ||
- class_path: lightning.pytorch.callbacks.EarlyStopping | ||
init_args: | ||
patience: 20 | ||
monitor: image_AUROC | ||
mode: max |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Dinomaly: Vision Transformer-based Anomaly Detection with Feature Reconstruction | ||
|
||
This is the implementation of the Dinomaly model based on the [original implementation](https://github.com/guojiajeremy/Dinomaly). | ||
|
||
Model Type: Segmentation | ||
|
||
## Description | ||
|
||
Dinomaly is a Vision Transformer-based anomaly detection model that uses an encoder-decoder architecture | ||
for feature reconstruction. | ||
The model leverages pre-trained DINOv2 Vision Transformer features and employs a reconstruction-based approach | ||
to detect anomalies by comparing encoder and decoder features. | ||
|
||
### Architecture | ||
|
||
The Dinomaly model consists of three main components: | ||
|
||
1. DINOv2 Encoder: A pre-trained Vision Transformer (ViT) which extracts multi-scale feature maps. | ||
2. Bottleneck MLP: A simple feed-forward network that collects features from the encoder's middle layers | ||
(e.g., 8 out of 12 layers for ViT-Base). | ||
3. Vision Transformer Decoder: Consisting of Transformer layers (typically 8), it learns to reconstruct the | ||
compressed middle-level features by maximising cosine similarity with the encoder's features. | ||
|
||
Only the parameters of the bottleneck MLP and the decoder are trained. | ||
|
||
#### Key Components | ||
|
||
1. Foundation Transformer Models: Dinomaly leverages pre-trained ViTs (like DinoV2) which provide universal and | ||
discriminative features. This use of foundation models enables strong performance across various image patterns. | ||
2. Noisy Bottleneck: This component activates built-in Dropout within the MLP bottleneck. | ||
By randomly discarding neural activations, Dropout acts as a "pseudo feature anomaly," which forces the decoder | ||
to restore only normal features. This helps prevent the decoder from becoming too adept at reconstructing | ||
anomalous patterns it has not been specifically trained on. | ||
3. Linear Attention: Instead of traditional Softmax Attention, Linear Attention is used in the decoder. | ||
Linear Attention's inherent inability to heavily focus on local regions, a characteristic sometimes seen as a | ||
"side effect" in supervised tasks, is exploited here. This property encourages attention to spread across | ||
the entire image, reducing the likelihood of the decoder simply forwarding identical information | ||
from unexpected or anomalous patterns. This also contributes to computational efficiency. | ||
4. Loose Reconstruction: | ||
1. Loose Constraint: Rather than enforcing rigid layer-to-layer reconstruction, Dinomaly groups multiple | ||
encoder layers as a whole for reconstruction (e.g., into low-semantic and high-semantic groups). | ||
This provides the decoder with more degrees of freedom, allowing it to behave more distinctly from the | ||
encoder when encountering unseen patterns. | ||
2. Loose Loss: The point-by-point reconstruction loss function is loosened by employing a hard-mining | ||
global cosine loss. This approach detaches the gradients of feature points that are already well-reconstructed | ||
during training, preventing the model from becoming overly proficient at reconstructing all features, | ||
including those that might correspond to anomalies. | ||
|
||
### Anomaly Detection | ||
|
||
Anomaly detection is performed by computing cosine similarity between encoder and decoder features at multiple scales. | ||
The model generates anomaly maps by analyzing the reconstruction quality of features, where poor reconstruction | ||
indicates anomalous regions. Both anomaly detection (image-level) and localization (pixel-level) are supported. | ||
|
||
## Usage | ||
|
||
`anomalib train --model Dinomaly --data MVTecAD --data.category <category>` | ||
|
||
## Benchmark | ||
|
||
All results gathered with seed `42`. The `max_steps` parameter is set to `5000` for training. | ||
|
||
## [MVTec AD Dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad) | ||
|
||
### Image-Level AUC | ||
|
||
| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor | | ||
| -------- | :---: | :----: | :---: | :-----: | :---: | :---: | :----: | :---: | :-----: | :------: | :-------: | :---: | :---: | :--------: | :--------: | | ||
| Dinomaly | 0.995 | 0.998 | 0.999 | 1.000 | 1.000 | 0.993 | 1.000 | 1.000 | 0.988 | 1.000 | 1.000 | 0.993 | 0.985 | 1.000 | 0.997 | | ||
|
||
### Pixel-Level AUC | ||
|
||
| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor | | ||
| -------- | :---: | :----: | :---: | :-----: | :---: | :---: | :----: | :---: | :-----: | :------: | :-------: | :---: | :---: | :--------: | :--------: | | ||
| Dinomaly | 0.981 | 0.993 | 0.993 | 0.993 | 0.975 | 0.975 | 0.990 | 0.981 | 0.986 | 0.994 | 0.969 | 0.977 | 0.997 | 0.988 | 0.950 | | ||
|
||
### Image F1 Score | ||
|
||
| | Avg | Carpet | Grid | Leather | Tile | Wood | Bottle | Cable | Capsule | Hazelnut | Metal Nut | Pill | Screw | Toothbrush | Transistor | | ||
| -------- | :---: | :----: | :---: | :-----: | :---: | :---: | :----: | :---: | :-----: | :------: | :-------: | :---: | :---: | :--------: | :--------: | | ||
| Dinomaly | 0.985 | 0.983 | 0.991 | 0.995 | 0.994 | 0.975 | 1.000 | 0.995 | 0.982 | 1.000 | 1.000 | 0.986 | 0.957 | 0.983 | 0.976 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
"""Dinomaly: Vision Transformer-based Anomaly Detection with Feature Reconstruction. | ||
|
||
The Dinomaly model implements a Vision Transformer encoder-decoder architecture for | ||
anomaly detection using pre-trained DINOv2 features. The model extracts features from | ||
multiple intermediate layers of a DINOv2 encoder, compresses them through a bottleneck | ||
MLP, and reconstructs them using a Vision Transformer decoder. | ||
|
||
Anomaly detection is performed by computing cosine similarity between encoder and decoder | ||
features at multiple scales. The model is particularly effective for visual anomaly | ||
detection tasks where the goal is to identify regions or images that deviate from | ||
normal patterns learned during training. | ||
|
||
Example: | ||
>>> from anomalib.models.image import Dinomaly | ||
>>> model = Dinomaly() | ||
|
||
The model can be used with any of the supported datasets and task modes in | ||
anomalib. It leverages the powerful feature representations from DINOv2 Vision | ||
Transformers combined with a reconstruction-based approach for robust anomaly detection. | ||
|
||
Notes: | ||
- Uses DINOv2 Vision Transformer as the backbone encoder | ||
- Features are extracted from intermediate layers for multi-scale analysis | ||
- Employs feature reconstruction loss for unsupervised learning | ||
- Supports both anomaly detection and localization tasks | ||
- Requires significant GPU memory due to Vision Transformer architecture | ||
|
||
See Also: | ||
:class:`anomalib.models.image.dinomaly.lightning_model.Dinomaly`: | ||
Lightning implementation of the Dinomaly model. | ||
""" | ||
|
||
from anomalib.models.image.dinomaly.lightning_model import Dinomaly | ||
|
||
__all__ = ["Dinomaly"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
"""Components module for Dinomaly model. | ||
|
||
This module provides all the necessary components for the Dinomaly Vision Transformer | ||
architecture including layers, model loader, utilities, and vision transformer implementations. | ||
""" | ||
|
||
# Layer components | ||
from .layers import ( | ||
Attention, | ||
Block, | ||
DinomalyMLP, | ||
LinearAttention, | ||
MemEffAttention, | ||
) | ||
|
||
# Model loader | ||
from .model_loader import DinoV2Loader, load | ||
|
||
# Utility functions and classes | ||
from .training_utils import ( | ||
CosineHardMiningLoss, | ||
StableAdamW, | ||
WarmCosineScheduler, | ||
) | ||
|
||
# Vision transformer components | ||
from .vision_transformer import ( | ||
DinoVisionTransformer, | ||
) | ||
|
||
__all__ = [ | ||
# Layers | ||
"Attention", | ||
"Block", | ||
"DinomalyMLP", | ||
"LinearAttention", | ||
"MemEffAttention", | ||
# Model loader | ||
"DinoV2Loader", | ||
"load", | ||
# Utils | ||
"StableAdamW", | ||
"WarmCosineScheduler", | ||
"CosineHardMiningLoss", | ||
# Vision transformer | ||
"DinoVisionTransformer", | ||
] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.