Skip to content

Tokenizer #31

@dwhieb

Description

@dwhieb

A tool which tokenizes a string based on a set of punctuation and a set of delimiters. Users should also be able to save tokenization schemas, and export their data in different formats (.csv, .tsv, one token per line, JSON).

What features should the suggested tool have?

  • punctuation presets
  • delimiter presets
  • copy-pasting input/output
  • file upload/download for input/output

To Do

  • Purpose/Overview
  • Directions, hidden once dismissed
  • Underlying libraries, with links
  • Save work + settings to local storage
  • Domain redirects (e.g. transliterate.digitallinguistics.io > tools.digitallinguistics.io/transliterate
  • Data Import/Export

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions