Skip to content

timedelta64 #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions data-types/timedelta64/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# timedelta64 data type

This document defines a Zarr data type to model the `timedelta64` data type from NumPy. The `timedelta64` data type represents signed temporal durations.

## Background

`timedelta64` is based on a data type with the same name defined in [NumPy](https://NumPy.org/). To provide necessary context, this document first describes how `timedelta64` works in NumPy before detailing its specification in Zarr.

The following references to NumPy are based on version 2.2 of that library.

NumPy defines a data type called `"timedelta64"` to represent signed temporal durations. These durations arise when taking a difference between moments in time. NumPy models moments in time with a related data type called `"datetime64"`. Both data types are described in the [NumPy documentation](https://NumPy.org/doc/stable/reference/arrays.datetime.html), which should be considered authoritative.

`timedelta64` data types are parametrized by a physical unit of duration, like seconds or minutes, and a positive integral scale factor. For example, given a `timedelta64` data type defined with a unit of seconds and a duration 10, the scalar value `1` in that data type represents a duration of 10 seconds.

NumPy represents `timedelta64` scalars with 64 bit signed integers. Negative values are permitted. The smallest 64-bit signed integer, i.e., `-2^63`, represents a non-duration value called "Not a Time", or `NaT`. The `NaT` value serves a role similar to the "Not a Number" value used floating point data types.

### NumPy data type parameters

#### Scale factor
The NumPy `timedelta64` data type takes a scaling factor. It must be an integer in the range `[1, 2147483647]`, i.e. `[1, 2^31 - 1]`.

While it is possible to construct a NumPy `timedelta64` data type with a scaling factor of `0`, NumPy will automatically normalize this to `1`.

#### Unit
The NumPy `timedelta64` data type takes a unit parameter, which must be one of the following temporal units:

| Identifier | Meaning |
|------------|----------|
| Y | year |
| M | month |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that year and month are super problematic as units because they don't actually have a fixed duration (leap years, variable months). I would hate to see us proliferating data with this encoding into the world. But I guess if the goal is numpy compatibility, we should leave them in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree that the numpy definition is problematic. But I think there's value in a data type that numpy users (or zarr v2 users) can adopt without thinking. We should specify a less problematic, more generally useful datetime data type in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be useful to rename this data type to numpy.timedelta64 to signal the intent that it is only meant for compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.timedelta64 is actually my preferred name, but iirc @rabernat was not a fan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this naming concern affects all the numpy dtypes, we should resolve that conversation in #4.

| W | week |
| D | day |
| h | hour |
| m | minute |
| s | second |
| ms | millisecond |
| us | microsecond |
| μs | microsecond |
| ns | nanosecond |
| ps | picosecond |
| fs | femtosecond |
| as | attosecond |

> Note: "us" and "μs" are treated as equivalent by NumPy.

> Note: NumPy permits the creation of `timedelta64` data types with an unspecified unit. In this case, the unit is set to the special value `"generic"`.

#### Endianness
The NumPy `timedelta64` data type takes a byte order parameter, which must be either little-endian or big-endian.

## Data type representation

### Name

The name of this data type is the string `"timedelta64"`.

### Configuration

This data type requires a configuration. The configuration for this data type is a JSON object with the following fields:

| field name | type | required | notes |
|------------|----------|---|---|
| `"unit"` | one of: `"Y"`, `"M"` , `"W"`, `"D"` , `"h"` , `"m"` , `"s"` , `"ms"` , `"us"` , `"μs"` , `"ns"` , `"ps"` , `"fs"` , `"as"`, `"generic"` | yes | None |
| `"scale_factor"` | `integer` | yes | The number must represent an integer from the inclusive range `[1, 2147483647]` |

> Note: the NumPy `timedelta64` data type is parametrized by an endianness (little or big), but the Zarr `timedelta64` data type is not. In Zarr, the endianness of `timedelta64` arrays is determined by the configuration of the `codecs` metadata and is thus not part of the data type configuration.

> Note: as per NumPy, `"us"` and `"μs"` are equivalent and interchangeable representations of microseconds.

No additional fields are permitted in the configuration.

### Examples
The following is an example of the metadata for a `timedelta64` data type with a unit of microseconds and a scale factor of 10. This configuration defines a data type equivalent to the NumPy data type `timedelta64[10us]`:

```json
{
"name": "timedelta64",
"configuration": {
"unit": "us",
"scale_factor": 10
}
}
```

## Fill value representation

For the `"fill_value"` field of array metadata, `timedelta64` scalars must be represented in one of two forms:
- As JSON number with no fraction or exponent part that is within the range `[-2^63, 2^63 - 1]`.
- As the string `"NaT"`, which denotes the value `NaT`.

> Note: the `NaT` value may optionally be encoded as the JSON number `-9223372036854775808`, i.e., `-2^63`. That is, `"fill_value": "NaT"` and `"fill_value": -9223372036854775808` should be treated as equivalent.

## Codec compatibility

This data type is compatible with any codec that supports arrays of signed 64-bit integers.
28 changes: 28 additions & 0 deletions data-types/timedelta64/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "timedelta64",
"type": "object",
"properties": {
"name": {
"const": "timedelta64"
},
"configuration": {
"type": "object",
"properties": {
"unit": {
"type": "string",
"enum": ["Y", "M", "W", "D", "h", "m", "s", "ms", "us", "μs", "ns", "ps", "fs", "as", "generic"]
},
"scale_factor": {
"type": "integer",
"minimum": 1,
"maximum": 2147483647
}
},
"required": ["unit", "scale_factor"],
"additionalProperties": false
}
},
"required": ["name", "configuration"],
"additionalProperties": false
}