Skip to content

Refactor datatype mapping #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 10 additions & 46 deletions Zarr.m
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,13 @@
ChunkSize
DsetSize
FillValue
MatlabDatatype
Datatype
Compression
TensorstoreSchema
KVStoreSchema % Schema to represent the storage backend specification (local file, S3, etc)
KVStoreSchema % Schema to represent the storage backend specification (local file, S3, etc)
isRemote
end

properties (Access = protected)
TensorstoreDatatype
ZarrDatatype
end

properties(Constant, Access = protected)
MATLABDatatypes = ["logical", "uint8", "int8", "uint16", "int16", "uint32", "int32", "uint64", "int64", "single", "double"];
TstoreDatatypes = ["bool", "uint8", "int8", "uint16", "int16", "uint32", "int32", "uint64", "int64", "float32", "float64"];
ZarrDatatypes = ["|b1", "|u1", "|i1", "<u2", "<i2", "<u4", "<i4", "<u8", "<i8", "<f4", "<f8"];

end

properties (Dependent, Access = protected)
TstoredtypeMap % hash map from MATLAB datatypes to Tensorstore datatypes
ZarrdtypeMap % hash map from MATLAB datatypes to Zarr datatypes.
end

methods(Static)
function pySetup
Expand Down Expand Up @@ -175,19 +159,7 @@ function makeZarrGroups(existingParentPath, newGroupsPath)
end

methods

function TstoredtypeMap = get.TstoredtypeMap(obj)
% Function to create hash map from MATLAB datatypes to
% Tensorstore datatypes.
TstoredtypeMap = dictionary(obj.MATLABDatatypes, obj.TstoreDatatypes);
end

function ZarrdtypeMap = get.ZarrdtypeMap(obj)
% Function to create hash map from MATLAB datatypes to
% Zarr datatypes.
ZarrdtypeMap = dictionary(obj.MATLABDatatypes, obj.ZarrDatatypes);
end


function obj = Zarr(path)
% Load the Python library
Zarr.pySetup;
Expand Down Expand Up @@ -223,28 +195,20 @@ function makeZarrGroups(existingParentPath, newGroupsPath)
end

ndArrayData = py.ZarrPy.readZarr(obj.KVStoreSchema);
% Identify the Python datatype
obj.TensorstoreDatatype = string(ndArrayData.dtype.name);

% Extract the corresponding MATLAB datatype key from the
% dictionary
TstoredtypeTable = entries(obj.TstoredtypeMap);
obj.MatlabDatatype = TstoredtypeTable.Key(TstoredtypeTable.Value == obj.TensorstoreDatatype);

obj.ZarrDatatype = obj.ZarrdtypeMap(obj.MatlabDatatype);
% Store the datatype
obj.Datatype = ZarrDatatype.fromTensorstoreType(ndArrayData.dtype.name);

% Convert the numpy array to MATLAB array
data = cast(ndArrayData, obj.MatlabDatatype);
data = cast(ndArrayData, obj.Datatype.MATLABType);
end

function create(obj, dtype, data_size, chunk_size, fillvalue, compression)
% Function to create the Zarr array

obj.DsetSize = int64(data_size);
obj.ChunkSize = int64(chunk_size);
obj.MatlabDatatype = dtype;
obj.TensorstoreDatatype = obj.TstoredtypeMap(dtype);
obj.ZarrDatatype = obj.ZarrdtypeMap(dtype);
obj.Datatype = ZarrDatatype.fromMATLABType(dtype);

% If compression is empty, it means no compression
if isempty(compression)
Expand All @@ -257,7 +221,7 @@ function create(obj, dtype, data_size, chunk_size, fillvalue, compression)
if isempty(fillvalue)
obj.FillValue = py.None;
else
obj.FillValue = cast(fillvalue, obj.MatlabDatatype);
obj.FillValue = cast(fillvalue, obj.Datatype.MATLABType);
end

% see how much of the provided path exists already
Expand All @@ -266,8 +230,8 @@ function create(obj, dtype, data_size, chunk_size, fillvalue, compression)
% The Python function returns the Tensorstore schema, but we
% do not use it for anything at the moment.
obj.TensorstoreSchema = py.ZarrPy.createZarr(obj.KVStoreSchema, py.numpy.array(obj.DsetSize),...
py.numpy.array(obj.ChunkSize), obj.TensorstoreDatatype, ...
obj.ZarrDatatype, obj.Compression, obj.FillValue);
py.numpy.array(obj.ChunkSize), obj.Datatype.TensorstoreType, ...
obj.Datatype.ZarrType, obj.Compression, obj.FillValue);
%py.ZarrPy.temp(py.numpy.array([1, 1]), py.numpy.array([2, 2]))

% if new directories were created as part of creating a
Expand Down
89 changes: 89 additions & 0 deletions ZarrDatatype.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
classdef ZarrDatatype
%ZARRDATATYPE Datatype of Zarr data
% Represents the datatype mapping between MATLAB, Tensorstore, and Zarr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

% Copyright 2025 The MathWorks, Inc.

properties(Constant, Hidden)
% Same-length arrays that represent mapping between
% three kinds of datatypes
MATLABTypes = ["logical", "uint8", "int8", "uint16", "int16",...
"uint32", "int32", "uint64", "int64", "single", "double"];
TensorstoreTypes = ["bool", "uint8", "int8", "uint16", "int16",...
"uint32", "int32", "uint64", "int64", "float32", "float64"];
ZarrTypes = ["|b1", "|u1", "|i1", "<u2", "<i2",...
"<u4", "<i4", "<u8", "<i8", "<f4", "<f8"];
end

properties (SetAccess = immutable, GetAccess=private, Hidden)
% Index into datatype arrays
Index (1,1) int32
end

properties (Dependent, SetAccess = immutable)
% Dependent properties representing the corresponding datatype in
% Zarr, Tensorstore, and MATLAB
ZarrType
TensorstoreType
MATLABType
end

methods (Hidden)
% "Private" constructor - should not be used directly.
% Use from*Type() static methods instead.
function obj = ZarrDatatype(ind)
obj.Index = ind;
end
end

methods
function zType = get.ZarrType(obj)
% Get the corresponding Zarr datatype
zType = ZarrDatatype.ZarrTypes(obj.Index);
end

function tType = get.TensorstoreType(obj)
% Get the corresponding Tensorstore datatype
tType = ZarrDatatype.TensorstoreTypes(obj.Index);
end

function mType = get.MATLABType(obj)
% Get the corresponding MATLAB datatype
mType = ZarrDatatype.MATLABTypes(obj.Index);
end
end

methods (Static)
function obj = fromMATLABType(MATLABType)
% Create a datatype object based on MATLAB datatype name
arguments
MATLABType (1,1) string {ZarrDatatype.mustBeMATLABType}
end

ind = find(MATLABType == ZarrDatatype.MATLABTypes);
obj = ZarrDatatype(ind);
end

function obj = fromTensorstoreType(tensorstoreType)
% Create a datatype object based on Tensorstore datatype name
arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the arguments block and validatestrings. You could either remove the args block (which BTW means you will accept any inputs that are convertable to string, like double and datetime) or use the mustBeMember validator function. Nothing specifically wrong with this pattern, but it's a mix of two paradigms and feels weird. I'd prefer the more modern approach.

Copy link
Member Author

@krisfed krisfed May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial impulse as well! Unfortunately there seems to be a limitation:

MATLABType (1,1) string {mustBeMember(MATLABType, ZarrDatatype.MATLABTypes)} results in:

Line 59: For input arguments, validation functions must only use previously declared positional arguments, the argument being validated, or literals.

And it felt worse to me to have to re-list valid types again as a literal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof.

You can call mustBeMember directly. You could write a local function in the class that calls it with the right arguments.

function mustBeMATLABType(type)
mustBeMember(type, ZarrDatatype.MATLABTypes);
end

And then use
MATLABType (1,1) string {mustBeTextScalar, mustBeMATLABType}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

tensorstoreType (1,1) string {ZarrDatatype.mustBeTensorstoreType}
end

ind = find(tensorstoreType == ZarrDatatype.TensorstoreTypes);
obj = ZarrDatatype(ind);
end


function mustBeMATLABType(type)
% Validator for MATLAB types
mustBeMember(type, ZarrDatatype.MATLABTypes);
end

function mustBeTensorstoreType(type)
% Validator for Tensorstore types
mustBeMember(type, ZarrDatatype.TensorstoreTypes)
end
end

end
8 changes: 8 additions & 0 deletions test/tZarrCreate.m
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,14 @@ function invalidSizeInput(testcase)
% testcase.PyException);
end


function invalidDatatype(testcase)
% Verify the error when an usupported datatype is used.
testcase.verifyError(@()zarrcreate(testcase.ArrPathWrite,...
testcase.ArrSize,Datatype="bla"),...
'MATLAB:validators:mustBeMember');
end

function invalidCompressionInputType(testcase)
% Verify error when an invalid compression value is used.
%testcase.assumeTrue(false,'Filtered until the issue is fixed.');
Expand Down