Turn bank exports (CSV/Excel) into OFX files for import into personal finance tools.
The utils/ package normalizes columns, cleans text, infers transaction types, and
renders OFX-compliant output. Optional LLM enrichment can provide nicer payee names and
categories.
Install dependencies (macOS):
python3 -m pip install -r requirements.txt
# Optional for tests or local LLM demos
python3 -m pip install pytest pyyaml mlx mlx-lmCreate an OFX from a CSV:
from pathlib import Path
from utils.etl import load_and_prepare
from utils.build_ofx import build_ofx
df = load_and_prepare(Path("transactions.csv"))
ofx = build_ofx(df, accttype="checking", acctid="12345")
Path("transactions.ofx").write_text(ofx, encoding="utf-8")Or run the example CLI in main.py after editing the file paths.
- Input:
examples/transactions.sample.csv - Run:
python3 examples/generate_ofx.py
- Output:
examples/transactions.sample.ofx
- Load via
utils.io.load_transactions(CSV/Excel; sheet chosen byutils.sheet.find_best_sheet). Columns are read asobjectdtype. - Normalize with
utils.sheet.normalize_columnsand map vendor columns to canonical names (acctnum, date, time, amount, description, trntype, fitid, ...) viautils.sheet.detect_columns. - ETL (
utils.etl.load_and_prepare) builds:raw_desc,payee_display(coalesced text);cleaned_desc(uppercased, squashed).date_parsed(UTC) + optionaltimeparsing (supports Excel fractions and HH:MM[:SS]).amount_clean(handles$, commas, parentheses negatives, empty/NaN).- Optional LLM columns:
payee_llm,category_llm,description_llm. trntype_normvia rule-driven inference;fitid_normcleanup if provided.
- Validate with
utils.validate.assert_ofx_ready(amount_cleanrequired, timestamp fallback allowed). - Render with
utils.build_ofx.build_ofx:<DTSTART>/<DTEND>fromdate_parsedrange or fallback.<DTPOSTED>always set (never the string "None").- Name/memo precedence:
payee_llm→payee_display→posting_memo→cleaned_desc→raw_desc. TRNTYPEfromtrntype_normor inferred from amount;FITIDgenerated if missing.
Enable by passing llm_client to load_and_prepare:
client = ... # your LLM SDK, callable, or mlx-lm config dict
df = load_and_prepare(Path("transactions.csv"), llm_client=client, llm_batch_size=10)Client shapes supported: an object with generate_batch(prompts), a callable, or an
mlx-lm configuration dict. See tests/test_utils.py for concrete examples.
from mlx_lm import load as mlx_load
from utils.etl import load_and_prepare
model, tokenizer = mlx_load("mlx-community/Llama-3.2-3B-Instruct-4bit")
client = {
"llm_model": "mlx_lm",
"model": model,
"tokenizer": tokenizer,
"generation_kwargs": {"max_tokens": 128},
}
df = load_and_prepare(Path("transactions.csv"), llm_client=client, llm_batch_size=10)Set use_batch=False to call mlx_lm.generate per prompt. LLM usage is optional; the
pipeline works fully offline when llm_client=None.
df = load_and_prepare(Path("transactions.csv"))
df["trntype_norm"] = infer_trntype_series(
Rules live in utils/rules.py and support JSON/YAML overrides. See
examples/rules.example.yaml for a ready-to-tweak template showing extend/replace.
The canonical API for applying rule-driven transaction-type inference is
provided by utils.trntype (preferred):
from utils.rules import load_rules
from utils.trntype import infer_trntype_series
rules = load_rules("examples/rules.example.yaml")
df = load_and_prepare(Path("transactions.csv"))
df["trntype_norm"] = infer_trntype_series(
df["amount_clean"], df.get("trntype"), df.get("cleaned_desc"), rules=rules
)For backwards compatibility utils.cleaning re-exports infer_trntype and
infer_trntype_series, so existing code that imports from
utils.cleaning will continue to work.
python3 -m pytest -qWe run mypy using a Python 3.11 virtual environment so that pandas-stubs and
related type packages are compatible. Quick setup:
# create a local venv (uses Python 3.11)
python3.11 -m venv .venv
source .venv/bin/activate
# bootstrap pip and install dev deps
python -m pip install --upgrade pip setuptools wheel
python -m pip install mypy pandas pandas-stubs numpy
# run the type checker and tests
python -m mypy --config-file mypy.ini utils
python -m pytest -qNotes:
- If your system doesn't have Python 3.11, install it (pyenv or system package) or run mypy in an environment that provides compatible stubs for pandas/numpy.
- CI should run mypy under Python 3.11 to match local checks.
Tests cover time parsing (incl. Excel fractions), amount cleaning, rule overrides, LLM batching/parsing, FITID generation, and OFX field precedence.
- Parentheses amounts are negatives; empty/"nan" handled.
- If both debit/credit columns exist, net amount = credit - debit.
derive_acctid_from_pathextracts digits from filenames; otherwise uses a stub + hash.NAMEis escaped and trimmed to 32 chars for OFX.
See docs/CONTRIBUTING.md for quick developer setup and the Makefile targets.
For convenience there's a small Makefile with common developer targets:
# create and bootstrap the Python 3.11 virtualenv and install dev deps
make venv
# run mypy (uses .venv)
make typecheck
# run tests (uses .venv)
make test
# clean up
make clean