Skip to content

quickhash implementation #85

@kaczmarj

Description

@kaczmarj

hello, i realized that tiffslide does not calculate a quickhash for an image, whereas openslide does. i wrote a small implementation of quickhash for one of my projects, though it does not follow openslide's implementation exactly. openslide's implementation hashes many of the properties as well as the smallest level of the image pyramid. my implementation hashes only two properties and the smallest level. another difference is that openslide uses sha256 and my implementation uses md5. that choice was arbitrary on my part, but if tiffslide would incorporate quickhash, sha256 would be the way to go.

please feel free to close this issue if it's noise!

"""Hash parts of a whole slide image.

This implementation is heavily inspired by OpenSlide's quickhash1:
https://github.com/openslide/openslide/blob/549e81b6662efe2b2285f11a5bcb31ccd7b95655/src/openslide-decode-tifflike.c#L996-L1143
"""

from __future__ import annotations

import hashlib

import tiffslide
from PIL import Image
from tiffslide.tiffslide import PROPERTY_NAME_COMMENT
from tiffslide.tiffslide import PROPERTY_NAME_VENDOR


def _read_smallest_level(tslide: tiffslide.TiffSlide) -> Image.Image:
    smallest_level = tslide.level_count - 1
    size = tslide.level_dimensions[smallest_level]
    return tslide.read_region((0, 0), level=smallest_level, size=size)


def _hash_str_and_property(
    hasher: hashlib._Hash, tslide: tiffslide.TiffSlide, name: str
) -> None:
    value = tslide.properties.get(name)
    if value is not None:
        hasher.update(name.encode())
        hasher.update(str(value).encode())


def quickhash(tslide: tiffslide.TiffSlide) -> str:
    """Return a quick MD5 hash of a whole slide image."""
    m = hashlib.md5()
    _hash_str_and_property(m, tslide, PROPERTY_NAME_COMMENT)
    _hash_str_and_property(m, tslide, PROPERTY_NAME_VENDOR)
    smallest_level_bytes = _read_smallest_level(tslide).tobytes()
    m.update(smallest_level_bytes)
    return m.hexdigest()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions