Skip to content

Commit db4c285

Browse files
blaz-rsamet-akcay
andauthored
Add ensembling methods for tiling to Anomalib (#1226)
* Fixed broken links in readme * Fixed inference command in readme * Add tiling for ensemble * Add tests for tiling for ensemble * Moved ensemble tiler to separate file * Modify padim config for ensemble * Add tiling to dataset * Revert changes to train * Add tiling to collate fn * Fix tiling in collate * Change val. function to protected * Add tile number logic * Move collate fn to separate file * Update tests for tiler * Add training loop for ensemble * Add model input size setup * Move ens config to separate file * Revert mvtec modifications * Remove unused imports in mvtec * Add batch adjustment to untiling * Add predict step to ensemble * Add comment and docstring to tile joining function * Move tile joining to separate function * Add joining for all tiled data * Add joining for all box data * Refactor pred. joining as modular class * Fix box joining * Add label and score joining * Add ensemble visualization * Add end of predict hook * Add metric computation * Fix metric thresholds * Add removal of individual visualization * Add demo1 notebook * Add docstrings and cleanup * Add memory benchmark * Add modular class for storing predictions * Add metric to separate class * Refactor to support prediction data class * Rename predictions class * Add filesystem predictions class * Add resized predictions class * Fix joiner for classification task * Add page peak to memory benchmark * Add global stats calculation * Add docstrings to stats calculation * Refactor joiner for pipeline * Refactor stats into pipeline * Refactor metrics as pipeline block * Refactor visualization as pipeline block * Refactor postprocessing into a pipeline * Add normalization and thresholding on joined predictions * Refactor tiler to accept config file * Add smoothing of tile joins. * Refactor ensemble datamodule preparation * Remove unused changes in dataloader * Fix metric configuration * Fix box coordinates in joining * Add ensemble callbacks preparation function * Fix box prediction bug in postprocess * Add ensemble params to config * Refactor postprocessing. * Refactor post-processing * Refactor predictions * Code cleanup * Optimize prediction storage * Make join smoothing configurable * Cleanup before PR * Fix stats pipeline * Fix logging strings * Fix memory benchmark * Fix tiler issues * Fix import issues * Fix naming in metrics and visualization * Fix cyclic import * Make logging lazy * Refactor tiler tests * Added collate tiling tests * Added ensemble helper functions tests * Refactor for dummy ensemble config * Refactor for dummy base config * Add tests for prediction storage * Add tests for prediction joiner * Add tests for visualization * Fix small issues in tests * Add metrics test * Add post-processing tests * Fix tiler to work with different instance * Move seed setting inside train loop * Fix pipeline stats bug * Rename ensemble config fixture * Add pipeline tests * Fix config in pipeline tests * Add training script test * Fix types and docstrings * Move and rename to tiled_ensemble * Fix bug in label joining. * Remove memory benchmark * Cleanup files * Fix metrics setup * Rename collate function * Add license to test files * Rename fixtures * Add more comments to tiled ensemble training * Add start of training log message * Refactor tiler to have explicit arguments * Refactor pred. storage to have explicit arguments * Refactor metrics to have explicit arguments * Refactor visualization to have explicit arguments * Refactor post-processing to have explicit arguments * Sort imports * Add test ensemble script * Fix join smoothing bug * Add more documentation to doc-strings * Remove unused import * Add brief tiled ensemble documentation * Update typehints * Make training args more clear * Revert addition of no threshold option. * Refactor normalization and threshold config * Remove tiled ensemble from docs index * Add comments to clarify parts of ensemble config * Improve ensemble config comments * Add num_tiles attribute to tiler. * Fix metrics process docstring * Fix visualization bug and cover with test * Replace strings with enum * Improve comments in joiner. * Fix bug when model doesn't have anomaly maps. * Improve docstrings (types, clarify). * Fix visualization tests * Fix dict membership checks * Add saving of ensemble config file * Update test script args * Cover test script with tests * Update export warning * Fix case when no test or val data * Improve documentation images * Add images for documentation * Add codacy suggestion * Refactor joiner to single class * Refactor storage names and config * Update normalization and threshold stage names * Add transforms independent input size to models Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Make collate function a datamodule attribute Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor tiled ensemble train into pipeline step Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor tiled ensemble prediction into pipeline step Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor tiled ensemble merging into pipeline step Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor tiled ensemble seam smoothing into pipeline step Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor tiled stats calculation into pipeline step Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix ckpt loading when predicting on test set. Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Add logging and add tqdm to pipeline steps. Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor normalization pipeline step Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor thresholding into new pipeline job * Fix transforms issue when predicting with dataloader * Add visualization as new pipeline step * Add metrics as new pipeline step * Format the code and address some lint problems Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Add code to skip test if test split is none Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Add accelerator to metrics and smoothing Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Make threshold acq helper function and add to threshold to metrics Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Make a separate test pipeline Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Restructure tiled ensemble files into directories Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Pipeline code cleanup Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> * Remove old tiled ensemble files Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Remove old post processing files Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix sigma value read in smoothing Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update stats calc and normalization Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update args naming convention Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor code for nice config Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update docs structure for new system Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Cleanup train code Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix test script args Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update box merging Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor helper function tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Small changes in helper and engine Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor merging tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor tiling tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor metrics test Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Add support for different threshold methods Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Format tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Change test to predict Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor stats calculation tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor prediction data tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update metrics tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Move metrics tests to components Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor seam smoothing tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor normalization tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Move mock stats to conftest Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix typehints for generator Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Refactor threshold tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Temporarily disable box minmax Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Add tiled ensemble integration test Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix normalization tests and add additional merging test Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Add tile collater tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Change dataset in tests to dummy Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Format and fix linter errors Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Format and some cleanup Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Rename predict to eval Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update docs for refactored version of code Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Cleanup the docs Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Update ensemble engine Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Remove boxes from pipelines and tests Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix TODO comment issue Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix unused model in ens. engine Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Fix path case in test Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Change temporary dir to project_path Signed-off-by: blaz-r <blaz.rolih@gmail.com> * Change mvtec to MVTec in test path Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> --------- Signed-off-by: blaz-r <blaz.rolih@gmail.com> Signed-off-by: Blaz Rolih <blaz.rolih@gmail.com> Co-authored-by: Samet Akcay <samet.akcay@intel.com>
1 parent 3a403ae commit db4c285

39 files changed

+3961
-249
lines changed
Loading
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# Pipelines
2+
3+
This guide demonstrates how to create a [Pipeline](../../reference/pipelines/index.md) for your custom task.
4+
5+
A pipeline is made up of runners. These runners are responsible for running a single type of job. A job is the smallest unit of work that is independent, such as, training a model or statistical comparison of the outputs of two models. Each job should be designed to be independent of other jobs so that they are agnostic to the runner that is running them. This ensures that the job can be run in parallel or serially without any changes to the job itself. The runner does not directly instantiate a job but rather has a job generator that generates the job based on the configuration. This generator is responsible for parsing the config and generating the job.
6+
7+
## Birds Eye View
8+
9+
In this guide we are going to create a dummy significant parameter search pipeline. The pipeline will have two jobs. The first job trains a model and computes the metric. The second job computes the significance of the parameters to the final score using shapely values. The final output of the pipeline is a plot that shows the contribution of each parameter to the final score. This will help teach you how to create a pipeline, a job, a job generator, and how to expose it to the `anomalib` CLI. The pipeline is going to be named `experiment`. So by the end of this you will be able to generate significance plot using
10+
11+
```{literalinclude} ../../../../snippets/pipelines/dummy/anomalib_cli.txt
12+
:language: bash
13+
```
14+
15+
The final directory structure will look as follows:
16+
17+
```{literalinclude} ../../../../snippets/pipelines/dummy/src_dir_structure.txt
18+
19+
```
20+
21+
```{literalinclude} ../../../../snippets/pipelines/dummy/tools_dir_structure.txt
22+
:language: bash
23+
```
24+
25+
## Creating the Jobs
26+
27+
Let's first look at the base class for the [jobs](../../reference/pipelines/base/job.md). It has a few methods defined.
28+
29+
- The `run` method is the main method that is called by the runner. This is where we will train the model and return the model metrics.
30+
- The `collect` method is used to gather the results from all the runs and collate them. This is handy as we want to pass a single object to the next job that contains details of all the runs including the final score.
31+
- The `save` method is used to write any artifacts to the disk. It accepts the gathered results as a parameter. This is useful in a variety of situations. Say, when we want to write the results in a csv file or write the raw anomaly maps for further processing.
32+
33+
Let's create the first job that trains the model and computes the metric. Since it is a dummy example, we will just return a random number as the metric.
34+
35+
```python
36+
class TrainJob(Job):
37+
name = "train"
38+
39+
def __init__(self, lr: float, backbone: str, stride: int):
40+
self.lr = lr
41+
self.backbone = backbone
42+
self.stride = stride
43+
44+
def run(self, task_id: int | None = None) -> dict:
45+
print(f"Training with lr: {self.lr}, backbone: {self.backbone}, stride: {self.stride}")
46+
time.sleep(2)
47+
score = np.random.uniform(0.7, 0.1)
48+
return {"lr": self.lr, "backbone": self.backbone, "stride": self.stride, "score": score}
49+
```
50+
51+
Ignore the `task_id` for now. It is used for parallel jobs. We will come back to it later.
52+
53+
````{note}
54+
The `name` attribute is important and is used to identify the arguments in the job config file.
55+
So, in our case the config `yaml` file will contain an entry like this:
56+
57+
```yaml
58+
...
59+
train:
60+
lr:
61+
backbone:
62+
stride:
63+
...
64+
````
65+
66+
Of course, it is up to us to choose what parameters should be shown under the `train` key.
67+
68+
Let's also add the `collect` method so that we return a nice dict object that can be used by the next job.
69+
70+
```python
71+
def collect(results: list[dict]) -> dict:
72+
output: dict = {}
73+
for key in results[0]:
74+
output[key] = []
75+
for result in results:
76+
for key, value in result.items():
77+
output[key].append(value)
78+
return output
79+
```
80+
81+
We can also define a `save` method that writes the dictionary as a csv file.
82+
83+
```python
84+
@staticmethod
85+
def save(results: dict) -> None:
86+
"""Save results in a csv file."""
87+
results_df = pd.DataFrame(results)
88+
file_path = Path("runs") / TrainJob.name
89+
file_path.mkdir(parents=True, exist_ok=True)
90+
results_df.to_csv(file_path / "results.csv", index=False)
91+
```
92+
93+
The entire job class is shown below.
94+
95+
```{literalinclude} ../../../../snippets/pipelines/dummy/train_job.txt
96+
:language: python
97+
```
98+
99+
Now we need a way to generate this job when the pipeline is run. To do this we need to subclass the [JobGenerator](../../reference/pipelines/base/generator.md) class.
100+
101+
The job generator is the actual object that is attached to a runner and is responsible for parsing the configuration and generating jobs. It has two methods that need to be implemented.
102+
103+
- `generate_job`: This method accepts the configuration as a dictionary and, optionally, the results of the previous job. For the train job, we don't need results for previous jobs, so we will ignore it.
104+
- `job_class`: This holds the reference to the class of the job that the generator will yield. It is used to inform the runner about the job that is being run, and is used to access the static attributes of the job such as its name, collect method, etc.
105+
106+
Let's first start by defining the configuration that the generator will accept. The train job requires three parameters: `lr`, `backbone`, and `stride`. We will also add another parameter that defines the number of experiments we want to run. One way to define it would be as follows:
107+
108+
```yaml
109+
train:
110+
experiments: 10
111+
lr: [0.1, 0.99]
112+
backbone:
113+
- resnet18
114+
- wide_resnet50
115+
stride:
116+
- 3
117+
- 5
118+
```
119+
120+
For this example the specification is defined as follows.
121+
122+
1. The number of experiments is set to 10.
123+
2. Learning rate is sampled from a uniform distribution in the range `[0.1, 0.99]`.
124+
3. The backbone is chosen from the list `["resnet18", "wide_resnet50"]`.
125+
4. The stride is chosen from the list `[3, 5]`.
126+
127+
```{note}
128+
While the `[ ]` and `-` syntax in `yaml` both signify a list, for visual disambiguation this example uses `[ ]` to denote closed interval and `-` for a list of options.
129+
```
130+
131+
With this defined, we can define the generator class as follows.
132+
133+
```{literalinclude} ../../../../snippets/pipelines/dummy/train_generator.txt
134+
:language: python
135+
```
136+
137+
Since this is a dummy example, we generate the next experiment randomly. In practice, you would use a more sophisticated method that relies on your validation metrics to generate the next experiment.
138+
139+
```{admonition} Challenge
140+
:class: tip
141+
For a challenge define your own configuration and a generator to parse that configuration.
142+
```
143+
144+
Okay, so now we can train the model. We still need a way to find out which parameters contribute the most to the final score. We will do this by computing the shapely values to find out the contribution of each parameter to the final score.
145+
146+
Let's first start by adding the library to our environment
147+
148+
```bash
149+
pip install shap
150+
```
151+
152+
The following listing shows the job that computes the shapely values and saves a plot that shows the contribution of each parameter to the final score. A quick rundown without going into the details of the job (as it is irrelevant to the pipeline) is as follows. We create a `RandomForestRegressor` that is trained on the parameters to predict the final score. We then compute the shapely values to identify the parameters that have the most significant impact on the model performance. Finally, the `save` method saves the plot so we can visually inspect the results.
153+
154+
```{literalinclude} ../../../../snippets/pipelines/dummy/significance_job.txt
155+
156+
```
157+
158+
Great! Now we have the job, as before, we need the generator. Since we only need the results from the previous stage, we don't need to define the config. Let's quickly write that as well.
159+
160+
```{literalinclude} ../../../../snippets/pipelines/dummy/significance_job_generator.txt
161+
162+
```
163+
164+
## Experiment Pipeline
165+
166+
So now we have the jobs, and a way to generate them. Let's look at how we can chain them together to achieve what we want. We will use the [Pipeline](../../reference/pipelines/base/pipeline.md) class to define the pipeline.
167+
168+
When creating a custom pipeline, there is only one important method that we need to implement. That is the `_setup_runners` method. This is where we chain the runners together.
169+
170+
```{literalinclude} ../../../../snippets/pipelines/dummy/pipeline_serial.txt
171+
:language: python
172+
```
173+
174+
In this example we use `SerialRunner` for running each job. It is a simple runner that runs the jobs in a serial manner. For more information on `SerialRunner` look [here](../../reference/pipelines/runners/serial.md).
175+
176+
Okay, so we have the pipeline. How do we run it? To do this let's create a simple entrypoint in `tools` folder of Anomalib.
177+
178+
Here is how the directory looks.
179+
180+
```{literalinclude} ../../../../snippets/pipelines/dummy/tools_dir_structure.txt
181+
:language: bash
182+
```
183+
184+
As you can see, we have the `config.yaml` file in the same directory. Let's quickly populate `experiment.py`.
185+
186+
```python
187+
from anomalib.pipelines.experiment_pipeline import ExperimentPipeline
188+
189+
if __name__ == "__main__":
190+
ExperimentPipeline().run()
191+
```
192+
193+
Alright! Time to take it on the road.
194+
195+
```bash
196+
python tools/experimental/experiment/experiment.py --config tools/experimental/experiment/config.yaml
197+
```
198+
199+
If all goes well you should see the summary plot in `runs/significant_feature/summary_plot.png`.
200+
201+
## Exposing to the CLI
202+
203+
Now that you have your shiny new pipeline, you can expose it as a subcommand to `anomalib` by adding an entry to the pipeline registry in `anomalib/cli/pipelines.py`.
204+
205+
```python
206+
if try_import("anomalib.pipelines"):
207+
...
208+
from anomalib.pipelines import ExperimentPipeline
209+
210+
PIPELINE_REGISTRY: dict[str, type[Pipeline]] | None = {
211+
"experiment": ExperimentPipeline,
212+
...
213+
}
214+
```
215+
216+
With this you can now call
217+
218+
```{literalinclude} ../../../../snippets/pipelines/dummy/anomalib_cli.txt
219+
:language: bash
220+
```
221+
222+
Congratulations! You have successfully created a pipeline that trains a model and computes the significance of the parameters to the final score 🎉
223+
224+
```{admonition} Challenge
225+
:class: tip
226+
This example used a random model hence the scores were meaningless. Try to implement a real model and compute the scores. Look into which parameters lead to the most significant contribution to your score.
227+
```
228+
229+
## Final Tweaks
230+
231+
Before we end, let's look at a few final tweaks that you can make to the pipeline.
232+
233+
First, let's run the initial model training in parallel. Since all jobs are independent, we can use the [ParallelRunner](../../reference/pipelines/runners/parallel.md). Since the `TrainJob` is a dummy job in this example, the pool of parallel jobs is set to the number of experiments.
234+
235+
```{literalinclude} ../../../../snippets/pipelines/dummy/pipeline_parallel.txt
236+
237+
```
238+
239+
You now notice that the entire pipeline takes lesser time to run. This is handy when you have large number of experiments, and when each job takes substantial time to run.
240+
241+
Now on to the second one. When running the pipeline we don't want our terminal cluttered with the outputs from each run. Anomalib provides a handy decorator that temporarily hides the output of a function. It suppresses all outputs to the standard out and the standard error unless an exception is raised. Let's add this to the `TrainJob`
242+
243+
```python
244+
from anomalib.utils.logging import hide_output
245+
246+
class TrainJob(Job):
247+
...
248+
249+
@hide_output
250+
def run(self, task_id: int | None = None) -> dict:
251+
...
252+
```
253+
254+
You will no longer see the output of the `print` statement in the `TrainJob` method in the terminal.

0 commit comments

Comments
 (0)