🕒 formatify

🧠 Auto-detect and standardize messy timestamp formats. Perfect for log parsers, data pipelines, or anyone tired of wrestling with inconsistent datetime strings.

⚠️ Problem

Ever pulled in a CSV or log file and found timestamps like this?

2023-03-01T12:30:45Z, 01/03/2023 12:30, Mar 1 2023 12:30 PM

How do you reliably infer and standardize them — especially when:

formats are mixed?
you have no schema?
fractional seconds and timezones are involved?

✅ Solution

formatify infers the datetime format(s) from a list of timestamp strings and gives you:

a valid strftime format string per group,
component roles (e.g. year, month, day),
clean, standardized timestamps,
structural grouping when needed.

No dependencies. Works out of the box.

📄 What This Library Does

Behind the scenes, formatify uses:

Regex patterns to split and identify timestamp tokens
Heuristics to assign roles like year, month, hour, etc.
Frequency analysis to distinguish stable vs. changing components
ISO 8601 detection for timezones, 'T' separators, and fractional seconds
Smart fallbacks for missing delimiters or ambiguous parts
Epoch detection (10 or 13 digit UNIX timestamps)

It produces:

one or more %Y-%m-%dT%H:%M:%SZ-style format strings
lists of cleaned, standardized YYYY-MM-DD HH:MM:SS values
per-group accuracy and metadata

🚀 Quick Example

from formatify_py.main import analyze_heterogeneous_timestamp_formats

samples = [
    "2023-07-15T14:23:05Z",
    "15/07/2023 14:23",
    "Jul 15, 2023 02:23 PM",
    "1689433385000"  # epoch in ms
]

results = analyze_heterogeneous_timestamp_formats(samples)

for gid, group in results.items():
    print("Group", gid)
    print("→ Format:", group["format_string"])
    print("→ Standardized:", group["standardized_timestamps"][:2])

🔍 Features

✅ Auto-detect strftime format ✅ Handles ISO 8601, text months, UNIX epoch ✅ Infers year/month/day/hour/minute roles ✅ Groups mixed formats automatically ✅ Timezone-aware ✅ No dependencies ✅ Fast and customizable

🧪 API

Main Entry Point

analyze_heterogeneous_timestamp_formats(samples: List[str]) -> Dict[int, Dict[str, Any]]

Returns a dictionary mapping group IDs to result dictionaries. Each result includes:

format_string: inferred strftime string
standardized_timestamps: parsed & normalized strings
component_roles: index → role
change_frequencies: component variability
iso_features: flags for ISO 8601 traits
detected_timezone: parsed offset (if any)
coverage: fraction of total samples in this group
accuracy: percent of valid parses in group

Lower-Level Functions

If you know all your samples have the same format:

infer_datetime_format_from_samples(samples: List[str]) -> Dict[str, Any]

🔊 Mixed Format Handling

formatify is designed to handle real-world timestamp mess. When your input includes a mix of styles — ISO, slashed, text-months, or epoch — it:

Groups samples by structural similarity
Infers format per group
Standardizes timestamps across each group

This lets you feed in 3 formats or 30, and still get clean, grouped results.

👁️ Design Notes

Want to know how the internals work? Check out:

How Formatify Thinks About Timestamps

🔍 Dev Guide

# Clone the repo
git clone https://github.com/PieceWiseProjects/formatify.git
cd formatify_py

# Set up environment
uv pip install -e .[dev,test]

# Lint and format
uv run ruff src/formatify_py

# Run tests
uv run pytest --cov=src/formatify_py

# Build for release
uv run python -m build

🚰 Contributing

We're just getting started — contributions, issues, and ideas welcome!

Fork and branch: git checkout -b feature/my-feature
Code and test
Lint and push
Open a pull request 💡

Follow our Contributor Guidelines.

📜 License

MIT — see LICENSE for details.

🙌 Credits

Built and maintained by Aalekh Roy Part of the PieceWiseProjects initiative.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github		.github
docs		docs
src/formatify_py		src/formatify_py
tests		tests
.gitignore		.gitignore
Animation.gif		Animation.gif
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕒 formatify

⚠️ Problem

✅ Solution

📄 What This Library Does

🚀 Quick Example

🔍 Features

🧪 API

Main Entry Point

Lower-Level Functions

🔊 Mixed Format Handling

👁️ Design Notes

🔍 Dev Guide

🚰 Contributing

📜 License

🙌 Credits

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

PieceWiseProjects/formatify

Folders and files

Latest commit

History

Repository files navigation

🕒 formatify

⚠️ Problem

✅ Solution

📄 What This Library Does

🚀 Quick Example

🔍 Features

🧪 API

Main Entry Point

Lower-Level Functions

🔊 Mixed Format Handling

👁️ Design Notes

🔍 Dev Guide

🚰 Contributing

📜 License

🙌 Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages