Cubit (WIP)

Multi-language, AST‑first parsing, analysis, and formatting toolkit written in Swift. Currently under active development and not yet published/packaged for installation. Expect rapid iteration and breaking changes.

Status

Swift Package exposes an executableTarget named cubit.
Only JavaScript (ES2015–ES2024 + select ESNext) tokenizer & parser are being implemented right now.
Formatter architecture scaffolded; JavaScript formatter in progress.
No stable CLI UX, versioning, or release artifacts yet.
Public API & AST node shapes may change without deprecation windows.

Vision

Cubit aims to provide:

High‑fidelity, lossless tokenization & AST with precise source mapping.
Multi-language plugin model (JS today; Python, TypeScript, Rust, C++ later).
Declarative, AST‑owned formatting (Prettier‑style determinism; whitespace agnostic input -> canonical output).
Analysis & transformation passes (linting hints, refactors, metrics) built atop shared core primitives.
Fast, memory‑aware design (preallocation, trie operator matching, minimal backtracking).

Key Architectural Principles

Layer	Summary
Tokenizer	Language‑specific, optimized (tries, Unicode category checks), produces strongly typed tokens.
Parser	Context‑sensitive, precedence‑aware, builds rich AST with `TokenRange` for every node.
AST	Class hierarchy per language (JS nodes prefixed with `JS*`), all inherit from core `ASTNode`.
Formatting	`Formattable` protocol; spacing is determined "after" each token; nodes decide layout via localized rules and parent context.
Error Handling	Precise positional errors via `ParserError` / `TokenizationError`.
CLI	Swift ArgumentParser scaffolding for future `parse`, `analyze`, `format`, `transform` subcommands.

JavaScript Support Snapshot

ES2015 → ES2024 constructs (classes, private fields, static blocks, async/await, template literals, bigint, regex d/v flags, etc.).
Disambiguation logic (regex vs division) under development.
Private member access: this.#field parsing.
Unicode escapes & template literal interpolation.

Formatter Design Highlights

AST‑First: Formatting rules colocated with node definitions.
Token-Aware: Spacing model attaches semantics to tokens instead of string post‑processing.
Declarative & Deterministic: Same AST → identical output, independent of original whitespace.
Extensible: Additional languages add their own Formattable conformances.

Project Layout

Sources/
  cubit.swift                # CLI entry (work in progress)
  Formatter/                 # Core formatting protocols & errors
  Languages/JavaScript/
    Tokenizer/               # JS tokenizer + token types
    Parser/                  # JS parser & AST node classes
    Formatter/               # JS formatter logic
Parser/ & Tokenizer/         # Shared base abstractions
Tests/JavaScript/            # Parser, tokenizer, formatter test suites

Getting Started (Development Only)

Prerequisites:

Swift 6.1 toolchain (macOS 15 target currently specified; Linux development may need adjustments if newer APIs are used).

Clone & build:

git clone https://github.com/alexandrosraikos/cubit.git cubit
cd cubit
swift build

Run tests:

swift test

Run the (unstable) executable:

swift run cubit --help

(Options and subcommands are provisional and may be empty or incomplete.)

Recommended Dev Loop

Add/extend a failing test in Tests/JavaScript/....
Implement tokenizer/parsing/formatting logic.
Re-run swift test until green.
Refactor for clarity & performance (keep tests passing).

Contributing (Early Phase)

Because APIs are volatile, large feature PRs may be reworked heavily. Suggested approach:

Open a brief issue / discussion with intent before major changes.
Keep PRs focused (one feature or fix + tests).
Include exhaustive parser/formatter tests for new syntax paths.
Maintain naming conventions (JS* prefixes, method styles like parseLeftHandSideExpression).

Roadmap (Indicative)

Short Term:

Flesh out remaining ES2024 edge cases & ESNext experiments.
Stabilize JS formatter spacing & breaking rules.
JSON / debug AST export.
CLI parse & format subcommands with basic output selection.

Medium Term:

Introduce analysis passes (dead code, complexity metrics).
Add transformation primitives (codemods, renaming).
Plugin loader for external analyzers.
Begin second language prototype (likely Python or TypeScript).

Long Term:

Language Server Protocol integration.
Incremental parsing & formatting for IDE latency reduction.
Rich diagnostics & autofix suggestions.
Fully documented public API & semantic versioning.

Design Notes & Best Practices

Always preserve original semantic info (no lossy transformations in AST).
Avoid speculative backtracking; use lookahead utilities judiciously.
Preallocate arrays where counts are predictable to reduce heap churn.
Keep formatting decisions local; avoid global mutable state.

Performance Considerations

Strategies employed / planned:

Operator trie for O(k) operator token recognition.
Cached lookahead for disambiguation (regex vs division).
Tight loops & minimal copies in hot tokenizer paths.
Potential future: incremental diff-based reparse.

Limitations / Known Gaps (Current Phase)

Not all ESNext proposals modeled yet.
Error recovery is minimal; parser halts at first fatal error.
Formatter not feature‑complete; output may shift between commits.
No binary distribution, Homebrew formula, or Linux CI matrix yet.

FAQ

Q: Can I use this in production? A: Not yet—APIs, AST, and output still unstable.

Q: How do I add a new language? A: Mirror Languages/JavaScript/ structure: implement tokenizer, parser, AST node classes inheriting from core bases, then add tests.

Q: Will there be a stable plugin API? A: Planned after core parser + formatter semantics settle.

Issues & Feedback

Please file issues for:

Incorrect parsing / AST shape divergences.
Crashes or performance regressions.
Formatter output anomalies.
Missing constructs you need (include a code sample).

License

Not yet finalized. Until a LICENSE file is added, usage is restricted—treat this as source for evaluation only. (A permissive license such as Apache 2.0 or MIT is under consideration.)

Attribution & Inspiration

Prettier (deterministic formatting philosophy)
Babel / Acorn (JS parsing strategies)
SwiftSyntax (token range mapping concepts)

This README describes an in-progress system; details may drift. Check commit history and tests for the definitive current behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.devcontainer		.devcontainer
.github/instructions		.github/instructions
.vscode		.vscode
Sources		Sources
Tests/JavaScript		Tests/JavaScript
.gitignore		.gitignore
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cubit (WIP)

Status

Vision

Key Architectural Principles

JavaScript Support Snapshot

Formatter Design Highlights

Project Layout

Getting Started (Development Only)

Recommended Dev Loop

Contributing (Early Phase)

Roadmap (Indicative)

Design Notes & Best Practices

Performance Considerations

Limitations / Known Gaps (Current Phase)

FAQ

Issues & Feedback

License

Attribution & Inspiration

About

Uh oh!

Languages

alexandrosraikos/cubit

Folders and files

Latest commit

History

Repository files navigation

Cubit (WIP)

Status

Vision

Key Architectural Principles

JavaScript Support Snapshot

Formatter Design Highlights

Project Layout

Getting Started (Development Only)

Recommended Dev Loop

Contributing (Early Phase)

Roadmap (Indicative)

Design Notes & Best Practices

Performance Considerations

Limitations / Known Gaps (Current Phase)

FAQ

Issues & Feedback

License

Attribution & Inspiration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages