Multi-language, AST‑first parsing, analysis, and formatting toolkit written in Swift. Currently under active development and not yet published/packaged for installation. Expect rapid iteration and breaking changes.
- Swift Package exposes an
executableTarget
namedcubit
. - Only JavaScript (ES2015–ES2024 + select ESNext) tokenizer & parser are being implemented right now.
- Formatter architecture scaffolded; JavaScript formatter in progress.
- No stable CLI UX, versioning, or release artifacts yet.
- Public API & AST node shapes may change without deprecation windows.
Cubit aims to provide:
- High‑fidelity, lossless tokenization & AST with precise source mapping.
- Multi-language plugin model (JS today; Python, TypeScript, Rust, C++ later).
- Declarative, AST‑owned formatting (Prettier‑style determinism; whitespace agnostic input -> canonical output).
- Analysis & transformation passes (linting hints, refactors, metrics) built atop shared core primitives.
- Fast, memory‑aware design (preallocation, trie operator matching, minimal backtracking).
Layer | Summary |
---|---|
Tokenizer | Language‑specific, optimized (tries, Unicode category checks), produces strongly typed tokens. |
Parser | Context‑sensitive, precedence‑aware, builds rich AST with TokenRange for every node. |
AST | Class hierarchy per language (JS nodes prefixed with JS* ), all inherit from core ASTNode . |
Formatting | Formattable protocol; spacing is determined "after" each token; nodes decide layout via localized rules and parent context. |
Error Handling | Precise positional errors via ParserError / TokenizationError . |
CLI | Swift ArgumentParser scaffolding for future parse , analyze , format , transform subcommands. |
- ES2015 → ES2024 constructs (classes, private fields, static blocks, async/await, template literals, bigint, regex
d
/v
flags, etc.). - Disambiguation logic (regex vs division) under development.
- Private member access:
this.#field
parsing. - Unicode escapes & template literal interpolation.
- AST‑First: Formatting rules colocated with node definitions.
- Token-Aware: Spacing model attaches semantics to tokens instead of string post‑processing.
- Declarative & Deterministic: Same AST → identical output, independent of original whitespace.
- Extensible: Additional languages add their own
Formattable
conformances.
Sources/
cubit.swift # CLI entry (work in progress)
Formatter/ # Core formatting protocols & errors
Languages/JavaScript/
Tokenizer/ # JS tokenizer + token types
Parser/ # JS parser & AST node classes
Formatter/ # JS formatter logic
Parser/ & Tokenizer/ # Shared base abstractions
Tests/JavaScript/ # Parser, tokenizer, formatter test suites
Prerequisites:
- Swift 6.1 toolchain (macOS 15 target currently specified; Linux development may need adjustments if newer APIs are used).
Clone & build:
git clone https://github.com/alexandrosraikos/cubit.git cubit
cd cubit
swift build
Run tests:
swift test
Run the (unstable) executable:
swift run cubit --help
(Options and subcommands are provisional and may be empty or incomplete.)
- Add/extend a failing test in
Tests/JavaScript/...
. - Implement tokenizer/parsing/formatting logic.
- Re-run
swift test
until green. - Refactor for clarity & performance (keep tests passing).
Because APIs are volatile, large feature PRs may be reworked heavily. Suggested approach:
- Open a brief issue / discussion with intent before major changes.
- Keep PRs focused (one feature or fix + tests).
- Include exhaustive parser/formatter tests for new syntax paths.
- Maintain naming conventions (
JS*
prefixes, method styles likeparseLeftHandSideExpression
).
Short Term:
- Flesh out remaining ES2024 edge cases & ESNext experiments.
- Stabilize JS formatter spacing & breaking rules.
- JSON / debug AST export.
- CLI parse & format subcommands with basic output selection.
Medium Term:
- Introduce analysis passes (dead code, complexity metrics).
- Add transformation primitives (codemods, renaming).
- Plugin loader for external analyzers.
- Begin second language prototype (likely Python or TypeScript).
Long Term:
- Language Server Protocol integration.
- Incremental parsing & formatting for IDE latency reduction.
- Rich diagnostics & autofix suggestions.
- Fully documented public API & semantic versioning.
- Always preserve original semantic info (no lossy transformations in AST).
- Avoid speculative backtracking; use lookahead utilities judiciously.
- Preallocate arrays where counts are predictable to reduce heap churn.
- Keep formatting decisions local; avoid global mutable state.
Strategies employed / planned:
- Operator trie for O(k) operator token recognition.
- Cached lookahead for disambiguation (regex vs division).
- Tight loops & minimal copies in hot tokenizer paths.
- Potential future: incremental diff-based reparse.
- Not all ESNext proposals modeled yet.
- Error recovery is minimal; parser halts at first fatal error.
- Formatter not feature‑complete; output may shift between commits.
- No binary distribution, Homebrew formula, or Linux CI matrix yet.
Q: Can I use this in production? A: Not yet—APIs, AST, and output still unstable.
Q: How do I add a new language?
A: Mirror Languages/JavaScript/
structure: implement tokenizer, parser, AST node classes inheriting from core bases, then add tests.
Q: Will there be a stable plugin API? A: Planned after core parser + formatter semantics settle.
Please file issues for:
- Incorrect parsing / AST shape divergences.
- Crashes or performance regressions.
- Formatter output anomalies.
- Missing constructs you need (include a code sample).
Not yet finalized. Until a LICENSE file is added, usage is restricted—treat this as source for evaluation only. (A permissive license such as Apache 2.0 or MIT is under consideration.)
- Prettier (deterministic formatting philosophy)
- Babel / Acorn (JS parsing strategies)
- SwiftSyntax (token range mapping concepts)
This README describes an in-progress system; details may drift. Check commit history and tests for the definitive current behavior.