Skip to content

v1.0 API Proposal #54

@polm

Description

@polm

Cutlet has been out for a few years now, and while I consider it basically functionally complete, the API is a little awkward as it's evolved over time. Since it's stable, I'd also like to release a v1.0 to indicate the API is reliable in the future. This issue is for my proposal and also to solicit feedback.

This is not a full API proposal - most of the evolution will be iterative and minor, like cleaning up which functions are public vs private. The main thing I want to do is make treatment of the different output options a little more clear. To that end I propose that the Cutlet object has the following main public methods of interest:

  • __callable__ / to_doc: returns a CutletDoc (see below)
  • to_romaji: returns a human legible string, like romaji now
  • to_slug: returns a machine-friendly string, like slug now
  • to_nodes: returns a list of nodes, like romaji_tokens now

A CutletDoc is inspired by a spaCy Doc object and contains:

  • raw input text
  • normalized input text
  • romaji/slug/nodes (lazily available, where appropriate)
  • a reference to the generating Cutlet object (so you can check config options)

The CutletDoc object has a few advantages. One is that if you need two of the above output formats, it allows you to avoid duplicate computation (MeCab calls) without having to manage state yourself. The other is that it can codify linking MeCab tokens to romaji tokens. The linking is very simple, but it's a commonly requested feature (#34, #37, #40, etc.), and (partly due to lack of examples on my part) users often find it confusing, so it would be good to provide a canonical process.

Separately, I will try making RomajiTokens proxy classes for MeCab tokens. I think this will work without issue, but it's possible that MeCab Nodes being Cython objects will be a problem.

While the API will change, the actual internal code will not change very much as part of this process. At the fastest this will take a few months, and a new version with DeprecationWarnings will be released. If you have a stable application and are happy with the current API, please be sure to use version guards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions