Releases: CopticScriptorium/tokenizers
v4.1.0 February 2016
This version integrates the DDGLC lemma list into the tokenization and morphological analysis.
Tokenizer v4.0.1
Adds support for morphological analysis and some bug fixes.
Tokenizer release 3.1.0 (July 16, 2015)
New version of the tokenizer:
- Corrects a bug with the line break addition parameter -l
- Adds better support for constructions with je- and nominalized tre-f
Tokenizer v. 3.0.1 May 2015
This release is similar to the previous release, except it provides additional instructions.
May 2015 release
Perl script tokenizes Coptic text segmented into bound groups into constituent parts for further annotation. Based on Layton's grammar. Also includes an Excel macro that merges cells based on certain conditions.
v 3.0 has improved accuracy in tokenization.
March 2015 release
Release includes perl script to tokenize Coptic text segmented into bound groups and an Excel macro that merges cells based on certain conditions.
11 March 2015
v2.0.1
Adds more patterns of various bound groups to the tokenizer and adds a parameter to accommodate diplomatic transcriptions of text in which a line break interrupts a bound group.