MultiClean is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.
Below: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).
pip install multiclean
or
uv add multiclean
import numpy as np
from multiclean import clean_array
# Create a sample classification array with classes 0, 1, 2, 3
array = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)
# Clean with default parameters
cleaned = clean_array(array)
# Custom parameters
cleaned = clean_array(
array,
class_values=[0, 1, 2, 3],
smooth_edge_size=2, # kernel width, larger value increases smoothness
min_island_size=100, # remove components with area < 100
connectivity=8, # 4 or 8
max_workers=4,
fill_nan=False # enable/disable the filling of nan values in input array
)
MultiClean is designed for cleaning segmentation outputs from:
- Remote sensing: Land cover classification, crop mapping
- Computer vision: Semantic segmentation post-processing
- Geospatial analysis: Raster classification cleaning
- Machine learning: Neural network output refinement
- Multi-class processing: Clean all classes in one pass
- Edge smoothing: Morphological opening to reduce jagged boundaries
- Island removal: Remove small connected components per class
- Gap filling: Fill invalids via nearest valid class (distance transform)
- Fast: NumPy + OpenCV + SciPy with parallelism
MultiClean uses morphological operations to clean classification arrays:
- Edge smoothing (per class): Morphological opening with an elliptical kernel.
- Island removal (per class): Find connected components (OpenCV) and mark components with area
< min_island_size
as invalid. - Gap filling: Compute a distance transform to copy the nearest valid class into invalid pixels.
Classes are processed together and the result maintains a valid label at every pixel.
from multiclean import clean_array
out = clean_array(
array: np.ndarray,
class_values: int | list[int] | None = None,
smooth_edge_size: int = 2,
min_island_size: int = 100,
connectivity: int = 4,
max_workers: int | None = None,
fill_nan: bool = False
)
array
: 2D numpy array of class labels (int or float). For float arrays,NaN
is treated as nodata and will remainNaN
unlessfill_nan
is set toTrue
.class_values
: Classes to consider. IfNone
, inferred fromarray
(ignoresNaN
for floats). An int restricts cleaning to a single class.smooth_edge_size
: Kernel size (pixels) for morphological opening. Use0
to disable.min_island_size
: Remove components with area strictly< min_island_size
. Use1
to keep single pixels.connectivity
: Pixel connectivity for components,4
or8
.max_workers
: Parallelism for per-class operations (None lets the executor choose).fill_nan
: If True will fill NAN values from input array with nearest valid value.
Returns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with NaN
are supported and can be filled or remain as NAN.
from multiclean import clean_array
import rasterio
# Read land cover classification
with rasterio.open('landcover.tif') as src:
landcover = src.read(1)
# Clean with appropriate parameters for satellite data
cleaned = clean_array(
landcover,
class_values=[0, 1, 2, 3, 4], # forest, water, urban, crop, other
smooth_edge_size=1,
min_island_size=25,
connectivity=8,
fill_nan=False
)
from multiclean import clean_array
# Model produces logits; convert to class predictions
np_pred = np_model_logits.argmax(axis=0) # shape: (H, W)
# Clean the segmentation
cleaned = clean_array(
np_pred,
smooth_edge_size=2,
min_island_size=100,
connectivity=4,
)
See the notebooks folder for end-to-end examples:
- Land Use Example Notebook: land use classification cleaning
- Cloud Example Notebook: cloud/shadow classification cleaning
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.