Skip to content

Commit 3a403ae

Browse files
ashwinvaidya17Bepiticsamet-akcay
authored
🚀 Add VLM based Anomaly Model (#2344)
* [Draft] Llm on (#2165) * Add TaskType Explanation Signed-off-by: Bepitic <bepitic@gmail.com> * Add llm model Signed-off-by: Bepitic <bepitic@gmail.com> * add ollama Signed-off-by: Bepitic <bepitic@gmail.com> * better description for descr in title Signed-off-by: Bepitic <bepitic@gmail.com> * add text of llm into imageResult visualization * add text of llm into imageResult visualization Signed-off-by: Bepitic <bepitic@gmail.com> * latest changes Signed-off-by: Bepitic <bepitic@gmail.com> * add wip llava/llava_next Signed-off-by: Bepitic <bepitic@gmail.com> * add init Signed-off-by: Bepitic <bepitic@gmail.com> * add text of llm into imageResult visualization Signed-off-by: Bepitic <bepitic@gmail.com> * latest changes Signed-off-by: Bepitic <bepitic@gmail.com> * upd Lint Signed-off-by: Bepitic <bepitic@gmail.com> * fix visualization with description Signed-off-by: Bepitic <bepitic@gmail.com> * show the images every batch Signed-off-by: Bepitic <bepitic@gmail.com> * fix docstring and error management Signed-off-by: Bepitic <bepitic@gmail.com> * Add compatibility for TaskType.EXPLANATION. Signed-off-by: Bepitic <bepitic@gmail.com> * Remove, show in the engine-Visualization. * fix visualization and llm openai multishot. * fix Circular import problem * Add HugginFace To LLavaNext Signed-off-by: Bepitic <bepitic@gmail.com> --------- Signed-off-by: Bepitic <bepitic@gmail.com> * 🔨 Scaffold for refactor (#2340) * initial scafold Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Apply PR comments Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * rename dir Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> --------- Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add ChatGPT (#2341) * initial scafold Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Apply PR comments Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * rename dir Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * delete llm_ollama Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add ChatGPT Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add ChatGPT Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Remove LLM model Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> --------- Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add Huggingface (#2343) * initial scafold Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Apply PR comments Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * rename dir Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * delete llm_ollama Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add ChatGPT Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add ChatGPT Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Remove LLM model Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add transformers Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Remove llava Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> --------- Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * 🔨 Minor Refactor (#2345) Refactor Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * undo changes Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * undo changes Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * undo changes to image.py Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Add explanation visualizer (#2351) * Add explanation visualizer Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * bug-fix Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> --------- Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * 🔨 Allow setting API keys from env (#2353) Allow setting API keys from env Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * 🧪 Add tests (#2355) * Add tests Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * remove explanation task type Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> --------- Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * minor fixes Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Update changelog Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Fix tests Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Address PR comments Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * update name Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> * Update src/anomalib/models/image/vlm_ad/lightning_model.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * update name Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> --------- Signed-off-by: Bepitic <bepitic@gmail.com> Signed-off-by: Ashwin Vaidya <ashwinnitinvaidya@gmail.com> Co-authored-by: Paco <bepitic@gmail.com> Co-authored-by: Samet Akcay <samet.akcay@intel.com>
1 parent 6eeb7f6 commit 3a403ae

File tree

17 files changed

+603
-9
lines changed

17 files changed

+603
-9
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
88

99
### Added
1010

11+
- Add `VlmAd` metric by [Bepitic](https://github.com/Bepitic) and refactored by [ashwinvaidya17](https://github.com/ashwinvaidya17) in https://github.com/openvinotoolkit/anomalib/pull/2344
1112
- Add `Datumaro` annotation format support by @ashwinvaidya17 in https://github.com/openvinotoolkit/anomalib/pull/2377
1213
- Add `AUPIMO` tutorials notebooks in https://github.com/openvinotoolkit/anomalib/pull/2330 and https://github.com/openvinotoolkit/anomalib/pull/2336
1314
- Add `AUPIMO` metric by [jpcbertoldo](https://github.com/jpcbertoldo) in https://github.com/openvinotoolkit/anomalib/pull/1726 and refactored by [ashwinvaidya17](https://github.com/ashwinvaidya17) in https://github.com/openvinotoolkit/anomalib/pull/2329

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ core = [
5656
"open-clip-torch>=2.23.0,<2.26.1",
5757
]
5858
openvino = ["openvino>=2024.0", "nncf>=2.10.0", "onnx>=1.16.0"]
59+
vlm = ["ollama", "openai", "python-dotenv","transformers"]
5960
loggers = [
6061
"comet-ml>=3.31.7",
6162
"gradio>=4",
@@ -84,7 +85,7 @@ test = [
8485
"coverage[toml]",
8586
"tox",
8687
]
87-
full = ["anomalib[core,openvino,loggers,notebooks]"]
88+
full = ["anomalib[core,openvino,loggers,notebooks, vlm]"]
8889
dev = ["anomalib[full,docs,test]"]
8990

9091
[project.scripts]

src/anomalib/callbacks/metrics.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,8 @@ def setup(
7878
elif self.task == TaskType.CLASSIFICATION:
7979
pixel_metric_names = []
8080
logger.warning(
81-
"Cannot perform pixel-level evaluation when task type is classification. "
82-
"Ignoring the following pixel-level metrics: %s",
83-
self.pixel_metric_names,
81+
"Cannot perform pixel-level evaluation when task type is {self.task.value}. "
82+
f"Ignoring the following pixel-level metrics: {self.pixel_metric_names}",
8483
)
8584
else:
8685
pixel_metric_names = (

src/anomalib/engine/engine.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
from anomalib.utils.normalization import NormalizationMethod
3333
from anomalib.utils.path import create_versioned_dir
3434
from anomalib.utils.types import NORMALIZATION, THRESHOLD
35-
from anomalib.utils.visualization import ImageVisualizer
35+
from anomalib.utils.visualization import BaseVisualizer, ExplanationVisualizer, ImageVisualizer
3636

3737
logger = logging.getLogger(__name__)
3838

@@ -322,7 +322,7 @@ def _setup_trainer(self, model: AnomalyModule) -> None:
322322
self._cache.update(model)
323323

324324
# Setup anomalib callbacks to be used with the trainer
325-
self._setup_anomalib_callbacks()
325+
self._setup_anomalib_callbacks(model)
326326

327327
# Temporarily set devices to 1 to avoid issues with multiple processes
328328
self._cache.args["devices"] = 1
@@ -405,7 +405,7 @@ def _setup_transform(
405405
if not getattr(dataloader.dataset, "transform", None):
406406
dataloader.dataset.transform = transform
407407

408-
def _setup_anomalib_callbacks(self) -> None:
408+
def _setup_anomalib_callbacks(self, model: AnomalyModule) -> None:
409409
"""Set up callbacks for the trainer."""
410410
_callbacks: list[Callback] = []
411411

@@ -432,9 +432,17 @@ def _setup_anomalib_callbacks(self) -> None:
432432
_callbacks.append(_ThresholdCallback(self.threshold))
433433
_callbacks.append(_MetricsCallback(self.task, self.image_metric_names, self.pixel_metric_names))
434434

435+
visualizer: BaseVisualizer
436+
437+
# TODO(ashwinvaidya17): temporary # noqa: TD003 ignoring as visualizer is getting a complete overhaul
438+
if model.__class__.__name__ == "VlmAd":
439+
visualizer = ExplanationVisualizer()
440+
else:
441+
visualizer = ImageVisualizer(task=self.task, normalize=self.normalization == NormalizationMethod.NONE)
442+
435443
_callbacks.append(
436444
_VisualizationCallback(
437-
visualizers=ImageVisualizer(task=self.task, normalize=self.normalization == NormalizationMethod.NONE),
445+
visualizers=visualizer,
438446
save=True,
439447
root=self._cache.args["default_root_dir"] / "images",
440448
),

src/anomalib/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
Rkde,
3131
Stfpm,
3232
Uflow,
33+
VlmAd,
3334
WinClip,
3435
)
3536
from .video import AiVad
@@ -58,6 +59,7 @@ class UnknownModelError(ModuleNotFoundError):
5859
"Stfpm",
5960
"Uflow",
6061
"AiVad",
62+
"VlmAd",
6163
"WinClip",
6264
]
6365

src/anomalib/models/image/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
from .rkde import Rkde
2121
from .stfpm import Stfpm
2222
from .uflow import Uflow
23+
from .vlm_ad import VlmAd
2324
from .winclip import WinClip
2425

2526
__all__ = [
@@ -40,5 +41,6 @@
4041
"Rkde",
4142
"Stfpm",
4243
"Uflow",
44+
"VlmAd",
4345
"WinClip",
4446
]
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
"""Visual Anomaly Model."""
2+
3+
# Copyright (C) 2024 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
from .lightning_model import VlmAd
7+
8+
__all__ = ["VlmAd"]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
"""VLM backends."""
2+
3+
# Copyright (C) 2024 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
from .base import Backend
7+
from .chat_gpt import ChatGPT
8+
from .huggingface import Huggingface
9+
from .ollama import Ollama
10+
11+
__all__ = ["Backend", "ChatGPT", "Huggingface", "Ollama"]
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""Base backend."""
2+
3+
# Copyright (C) 2024 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
from abc import ABC, abstractmethod
7+
from pathlib import Path
8+
9+
from anomalib.models.image.vlm_ad.utils import Prompt
10+
11+
12+
class Backend(ABC):
13+
"""Base backend."""
14+
15+
@abstractmethod
16+
def __init__(self, model_name: str) -> None:
17+
"""Initialize the backend."""
18+
19+
@abstractmethod
20+
def add_reference_images(self, image: str | Path) -> None:
21+
"""Add reference images for k-shot."""
22+
23+
@abstractmethod
24+
def predict(self, image: str | Path, prompt: Prompt) -> str:
25+
"""Predict the anomaly label."""
26+
27+
@property
28+
@abstractmethod
29+
def num_reference_images(self) -> int:
30+
"""Get the number of reference images."""
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
"""ChatGPT backend."""
2+
3+
# Copyright (C) 2024 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
import base64
7+
import logging
8+
import os
9+
from pathlib import Path
10+
from typing import TYPE_CHECKING
11+
12+
from dotenv import load_dotenv
13+
from lightning_utilities.core.imports import package_available
14+
15+
from anomalib.models.image.vlm_ad.utils import Prompt
16+
17+
from .base import Backend
18+
19+
if package_available("openai"):
20+
from openai import OpenAI
21+
else:
22+
OpenAI = None
23+
24+
if TYPE_CHECKING:
25+
from openai.types.chat import ChatCompletion
26+
27+
logger = logging.getLogger(__name__)
28+
29+
30+
class ChatGPT(Backend):
31+
"""ChatGPT backend."""
32+
33+
def __init__(self, model_name: str, api_key: str | None = None) -> None:
34+
"""Initialize the ChatGPT backend."""
35+
self._ref_images_encoded: list[str] = []
36+
self.model_name: str = model_name
37+
self._client: OpenAI | None = None
38+
self.api_key = self._get_api_key(api_key)
39+
40+
@property
41+
def client(self) -> OpenAI:
42+
"""Get the OpenAI client."""
43+
if OpenAI is None:
44+
msg = "OpenAI is not installed. Please install it to use ChatGPT backend."
45+
raise ImportError(msg)
46+
if self._client is None:
47+
self._client = OpenAI(api_key=self.api_key)
48+
return self._client
49+
50+
def add_reference_images(self, image: str | Path) -> None:
51+
"""Add reference images for k-shot."""
52+
self._ref_images_encoded.append(self._encode_image_to_url(image))
53+
54+
@property
55+
def num_reference_images(self) -> int:
56+
"""Get the number of reference images."""
57+
return len(self._ref_images_encoded)
58+
59+
def predict(self, image: str | Path, prompt: Prompt) -> str:
60+
"""Predict the anomaly label."""
61+
image_encoded = self._encode_image_to_url(image)
62+
messages = []
63+
64+
# few-shot
65+
if len(self._ref_images_encoded) > 0:
66+
messages.append(self._generate_message(content=prompt.few_shot, images=self._ref_images_encoded))
67+
68+
messages.append(self._generate_message(content=prompt.predict, images=[image_encoded]))
69+
70+
response: ChatCompletion = self.client.chat.completions.create(messages=messages, model=self.model_name)
71+
return response.choices[0].message.content
72+
73+
@staticmethod
74+
def _generate_message(content: str, images: list[str] | None) -> dict:
75+
"""Generate a message."""
76+
message: dict[str, list[dict] | str] = {"role": "user"}
77+
if images is not None:
78+
_content: list[dict[str, str | dict]] = [{"type": "text", "text": content}]
79+
_content.extend([{"type": "image_url", "image_url": {"url": image}} for image in images])
80+
message["content"] = _content
81+
else:
82+
message["content"] = content
83+
return message
84+
85+
def _encode_image_to_url(self, image: str | Path) -> str:
86+
"""Encode the image to base64 and embed in url string."""
87+
image_path = Path(image)
88+
extension = image_path.suffix
89+
base64_encoded = self._encode_image_to_base_64(image_path)
90+
return f"data:image/{extension};base64,{base64_encoded}"
91+
92+
@staticmethod
93+
def _encode_image_to_base_64(image: str | Path) -> str:
94+
"""Encode the image to base64."""
95+
image = Path(image)
96+
return base64.b64encode(image.read_bytes()).decode("utf-8")
97+
98+
def _get_api_key(self, api_key: str | None = None) -> str:
99+
if api_key is None:
100+
load_dotenv()
101+
api_key = os.getenv("OPENAI_API_KEY")
102+
if api_key is None:
103+
msg = (
104+
f"OpenAI API key must be provided to use {self.model_name}."
105+
" Please provide the API key in the constructor, or set the OPENAI_API_KEY environment variable"
106+
" or in a `.env` file."
107+
)
108+
raise ValueError(msg)
109+
return api_key

0 commit comments

Comments
 (0)