Skip to content

add Expressing Date / Datetime / Timezone information to readme #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 14, 2025

Conversation

dccakes
Copy link
Contributor

@dccakes dccakes commented Mar 10, 2025

Given the discussions we have been having across the slack, I am added in a section on how to express Date / Datetime / Timezone information in conjunction with logicalTypeOptions.format to the readme.

This is based on input from @pflooky and @jochenchrist

andrewrjones and others added 7 commits January 21, 2025 12:52
I noticed this link was broken. Hopefully this is the right one :)

Signed-off-by: Andrew Jones <andrew@andrew-jones.com>
Signed-off-by: Andrew Jones <andrew@andrew-jones.com>
Signed-off-by: Diego Carvallo <dccakes@gmail.com>
@dccakes dccakes marked this pull request as ready for review March 10, 2025 11:51
docs/README.md Outdated
- name: created_at
logicalType: date
logicalTypeOptions:
- format: "YYYY-MM-DDTHH:MM:SSZ"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jochenchrist this does imply that a tool like datacontract-cli or others will need to parse this format and detect all different kinds of time information patterns to know e.g. if the bigquery export type should be DATE or DATETIME or TIME or TIMESTAMP?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid. And why should a BigQuery user need to define the format when there are only one or two timestamp types available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this impacts how external tools such as the datacontract-cli or others implement it. However, the data contract should be tool agnostic in its implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can achieve this with the existing data contract schema via the physicalType option which allows you to define your data source specific data type. This will allows tools, consumers, producers etc. to know what data type to use in specific data source connections. logicalType is more of a human-friendly, consistent way of defining data types across all data contracts. You can read further on the rationale here. For reference, the OpenAPI spec does not even have a date data type. Rather, it is treated as a specific format of a string.

Maybe @dccakes you can also update the examples to show how users can define physicalType alongside logicalType for dates.

Copy link

@tomdw tomdw Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pflooky I see that OpenAPI does not have a date datatype, but is that not mainly because it is targeted at describing APIs that return a JSON-a-like response body which is all strings physically. However with databases and similar there is a a more rich type system. Needing to specify the physical type might not always be desired, you could envision a platform that takes care of the mapping the logical to a physical type to lower the cognitive load, and automatically fill in these physical types in the contract after tabel generation has been done.

The rationale in the docs you refer to does not directly provide arguments why not introducing date for dates without time, datetime for dates with time, and time for time without date is not a better option. Also seeing this in e.g. https://datapackage.org/standard/table-schema/#datetime or open table formats like Iceberg also go further than one type for this https://iceberg.apache.org/spec/#primitive-types.

Also for a human-friendly type system, I would like to know as consumer if this field is only a date, or a date with time, or time only without having to know the technical format notation.

(just an opinion, not intended to block this pr but to improve the standard ;-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pflooky - added an example and updated readme based on your input. let me know if this is what you were thinking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dccakes I think it's good for merging.

@tomdw I will help raise this in the next technical steering committee meeting and gather the teams' thoughts. I will update this thread as well once done (Tuesday next week).

dccakes added 2 commits March 10, 2025 16:07
Signed-off-by: Diego Carvallo <dccakes@gmail.com>
Signed-off-by: Diego Carvallo <dccakes@gmail.com>
@pflooky pflooky changed the base branch from main to dev March 14, 2025 02:58
@pflooky pflooky added this to the ODCS v3.0.2 milestone Mar 14, 2025
@pflooky pflooky merged commit 1ea8d1d into dev Mar 14, 2025
3 checks passed
@dccakes dccakes added the documentation Improvements or additions to documentation label Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants