Skip to content

Multi-language, AST‑first parsing, analysis, and formatting toolkit written in Swift.

Notifications You must be signed in to change notification settings

alexandrosraikos/cubit

Repository files navigation

Cubit (WIP)

Multi-language, AST‑first parsing, analysis, and formatting toolkit written in Swift. Currently under active development and not yet published/packaged for installation. Expect rapid iteration and breaking changes.

Status

  • Swift Package exposes an executableTarget named cubit.
  • Only JavaScript (ES2015–ES2024 + select ESNext) tokenizer & parser are being implemented right now.
  • Formatter architecture scaffolded; JavaScript formatter in progress.
  • No stable CLI UX, versioning, or release artifacts yet.
  • Public API & AST node shapes may change without deprecation windows.

Vision

Cubit aims to provide:

  • High‑fidelity, lossless tokenization & AST with precise source mapping.
  • Multi-language plugin model (JS today; Python, TypeScript, Rust, C++ later).
  • Declarative, AST‑owned formatting (Prettier‑style determinism; whitespace agnostic input -> canonical output).
  • Analysis & transformation passes (linting hints, refactors, metrics) built atop shared core primitives.
  • Fast, memory‑aware design (preallocation, trie operator matching, minimal backtracking).

Key Architectural Principles

Layer Summary
Tokenizer Language‑specific, optimized (tries, Unicode category checks), produces strongly typed tokens.
Parser Context‑sensitive, precedence‑aware, builds rich AST with TokenRange for every node.
AST Class hierarchy per language (JS nodes prefixed with JS*), all inherit from core ASTNode.
Formatting Formattable protocol; spacing is determined "after" each token; nodes decide layout via localized rules and parent context.
Error Handling Precise positional errors via ParserError / TokenizationError.
CLI Swift ArgumentParser scaffolding for future parse, analyze, format, transform subcommands.

JavaScript Support Snapshot

  • ES2015 → ES2024 constructs (classes, private fields, static blocks, async/await, template literals, bigint, regex d/v flags, etc.).
  • Disambiguation logic (regex vs division) under development.
  • Private member access: this.#field parsing.
  • Unicode escapes & template literal interpolation.

Formatter Design Highlights

  • AST‑First: Formatting rules colocated with node definitions.
  • Token-Aware: Spacing model attaches semantics to tokens instead of string post‑processing.
  • Declarative & Deterministic: Same AST → identical output, independent of original whitespace.
  • Extensible: Additional languages add their own Formattable conformances.

Project Layout

Sources/
  cubit.swift                # CLI entry (work in progress)
  Formatter/                 # Core formatting protocols & errors
  Languages/JavaScript/
    Tokenizer/               # JS tokenizer + token types
    Parser/                  # JS parser & AST node classes
    Formatter/               # JS formatter logic
Parser/ & Tokenizer/         # Shared base abstractions
Tests/JavaScript/            # Parser, tokenizer, formatter test suites

Getting Started (Development Only)

Prerequisites:

  • Swift 6.1 toolchain (macOS 15 target currently specified; Linux development may need adjustments if newer APIs are used).

Clone & build:

git clone https://github.com/alexandrosraikos/cubit.git cubit
cd cubit
swift build

Run tests:

swift test

Run the (unstable) executable:

swift run cubit --help

(Options and subcommands are provisional and may be empty or incomplete.)

Recommended Dev Loop

  1. Add/extend a failing test in Tests/JavaScript/....
  2. Implement tokenizer/parsing/formatting logic.
  3. Re-run swift test until green.
  4. Refactor for clarity & performance (keep tests passing).

Contributing (Early Phase)

Because APIs are volatile, large feature PRs may be reworked heavily. Suggested approach:

  • Open a brief issue / discussion with intent before major changes.
  • Keep PRs focused (one feature or fix + tests).
  • Include exhaustive parser/formatter tests for new syntax paths.
  • Maintain naming conventions (JS* prefixes, method styles like parseLeftHandSideExpression).

Roadmap (Indicative)

Short Term:

  • Flesh out remaining ES2024 edge cases & ESNext experiments.
  • Stabilize JS formatter spacing & breaking rules.
  • JSON / debug AST export.
  • CLI parse & format subcommands with basic output selection.

Medium Term:

  • Introduce analysis passes (dead code, complexity metrics).
  • Add transformation primitives (codemods, renaming).
  • Plugin loader for external analyzers.
  • Begin second language prototype (likely Python or TypeScript).

Long Term:

  • Language Server Protocol integration.
  • Incremental parsing & formatting for IDE latency reduction.
  • Rich diagnostics & autofix suggestions.
  • Fully documented public API & semantic versioning.

Design Notes & Best Practices

  • Always preserve original semantic info (no lossy transformations in AST).
  • Avoid speculative backtracking; use lookahead utilities judiciously.
  • Preallocate arrays where counts are predictable to reduce heap churn.
  • Keep formatting decisions local; avoid global mutable state.

Performance Considerations

Strategies employed / planned:

  • Operator trie for O(k) operator token recognition.
  • Cached lookahead for disambiguation (regex vs division).
  • Tight loops & minimal copies in hot tokenizer paths.
  • Potential future: incremental diff-based reparse.

Limitations / Known Gaps (Current Phase)

  • Not all ESNext proposals modeled yet.
  • Error recovery is minimal; parser halts at first fatal error.
  • Formatter not feature‑complete; output may shift between commits.
  • No binary distribution, Homebrew formula, or Linux CI matrix yet.

FAQ

Q: Can I use this in production? A: Not yet—APIs, AST, and output still unstable.

Q: How do I add a new language? A: Mirror Languages/JavaScript/ structure: implement tokenizer, parser, AST node classes inheriting from core bases, then add tests.

Q: Will there be a stable plugin API? A: Planned after core parser + formatter semantics settle.

Issues & Feedback

Please file issues for:

  • Incorrect parsing / AST shape divergences.
  • Crashes or performance regressions.
  • Formatter output anomalies.
  • Missing constructs you need (include a code sample).

License

Not yet finalized. Until a LICENSE file is added, usage is restricted—treat this as source for evaluation only. (A permissive license such as Apache 2.0 or MIT is under consideration.)

Attribution & Inspiration

  • Prettier (deterministic formatting philosophy)
  • Babel / Acorn (JS parsing strategies)
  • SwiftSyntax (token range mapping concepts)

This README describes an in-progress system; details may drift. Check commit history and tests for the definitive current behavior.

About

Multi-language, AST‑first parsing, analysis, and formatting toolkit written in Swift.

Topics

Resources

Stars

Watchers

Forks

Languages