🧠 Auto-detect and standardize messy timestamp formats. Perfect for log parsers, data pipelines, or anyone tired of wrestling with inconsistent datetime strings.
Ever pulled in a CSV or log file and found timestamps like this?
2023-03-01T12:30:45Z, 01/03/2023 12:30, Mar 1 2023 12:30 PM
How do you reliably infer and standardize them — especially when:
- formats are mixed?
- you have no schema?
- fractional seconds and timezones are involved?
formatify
infers the datetime format(s) from a list of timestamp strings and gives you:
- a valid
strftime
format string per group, - component roles (e.g. year, month, day),
- clean, standardized timestamps,
- structural grouping when needed.
No dependencies. Works out of the box.
Behind the scenes, formatify
uses:
- Regex patterns to split and identify timestamp tokens
- Heuristics to assign roles like
year
,month
,hour
, etc. - Frequency analysis to distinguish stable vs. changing components
- ISO 8601 detection for timezones, 'T' separators, and fractional seconds
- Smart fallbacks for missing delimiters or ambiguous parts
- Epoch detection (10 or 13 digit UNIX timestamps)
It produces:
- one or more
%Y-%m-%dT%H:%M:%SZ
-style format strings - lists of cleaned, standardized
YYYY-MM-DD HH:MM:SS
values - per-group accuracy and metadata
from formatify_py.main import analyze_heterogeneous_timestamp_formats
samples = [
"2023-07-15T14:23:05Z",
"15/07/2023 14:23",
"Jul 15, 2023 02:23 PM",
"1689433385000" # epoch in ms
]
results = analyze_heterogeneous_timestamp_formats(samples)
for gid, group in results.items():
print("Group", gid)
print("→ Format:", group["format_string"])
print("→ Standardized:", group["standardized_timestamps"][:2])
✅ Auto-detect strftime
format
✅ Handles ISO 8601, text months, UNIX epoch
✅ Infers year/month/day/hour/minute roles
✅ Groups mixed formats automatically
✅ Timezone-aware
✅ No dependencies
✅ Fast and customizable
analyze_heterogeneous_timestamp_formats(samples: List[str]) -> Dict[int, Dict[str, Any]]
Returns a dictionary mapping group IDs to result dictionaries. Each result includes:
format_string
: inferredstrftime
stringstandardized_timestamps
: parsed & normalized stringscomponent_roles
: index → rolechange_frequencies
: component variabilityiso_features
: flags for ISO 8601 traitsdetected_timezone
: parsed offset (if any)coverage
: fraction of total samples in this groupaccuracy
: percent of valid parses in group
If you know all your samples have the same format:
infer_datetime_format_from_samples(samples: List[str]) -> Dict[str, Any]
formatify
is designed to handle real-world timestamp mess. When your input includes a mix of styles — ISO, slashed, text-months, or epoch — it:
- Groups samples by structural similarity
- Infers format per group
- Standardizes timestamps across each group
This lets you feed in 3 formats or 30, and still get clean, grouped results.
Want to know how the internals work? Check out:
# Clone the repo
git clone https://github.com/PieceWiseProjects/formatify.git
cd formatify_py
# Set up environment
uv pip install -e .[dev,test]
# Lint and format
uv run ruff src/formatify_py
# Run tests
uv run pytest --cov=src/formatify_py
# Build for release
uv run python -m build
We're just getting started — contributions, issues, and ideas welcome!
- Fork and branch:
git checkout -b feature/my-feature
- Code and test
- Lint and push
- Open a pull request 💡
Follow our Contributor Guidelines.
MIT — see LICENSE for details.
Built and maintained by Aalekh Roy Part of the PieceWiseProjects initiative.