Update the status

egli · egli · commit bb56cd467122 · 2025-11-07T17:34:47.000+01:00
diff --git a/README.org b/README.org
@@ -17,7 +17,7 @@ using only character and most translation opcodes basically works.
 The original YAML test suite is supported and can be used to test the
 re-implementation.
 
-Currently, the re-implementation passes 68% of the liblouis test suite
+Currently, the re-implementation passes 83% of the liblouis test suite
 successfully.
 
 * Relation to liblouis
@@ -111,22 +111,30 @@ The parser is built from the grammar used in [[https://github.com/liblouis/tree-
 which is a port of the [[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form][EBNF grammar]] in [[https://github.com/liblouis/rewrite-louis][rewrite-louis]], which in turn is
 a just port of the [[https://en.wikipedia.org/wiki/Parsing_expression_grammar][Parsing expression grammar]] from [[https://github.com/liblouis/louis-parser][louis-parser]].
 
-* Todo [6/15]
-- [ ] Parse with context
+* Todo [7/15]
+- [X] Parse with context
   - currently tables are parsed line by line. Opcodes have no idea
     whether a character or a class has been defined before
   - Probably need to pass some context to the rule parser where
     character definitions and class names are kept
   - this is solved with a two-pass compilation now. The first pass
     collects all relevant information and the second pass consequently
     uses that.
-- [ ] (Emphasis and Caps) Indication
+- [-] Indication [2/3]
   - presumably this could be done independently of translation, i.e.
     find indication locations and put them in the typeform array
     before even translating.
+  - [X] Numeric indication
+  - [X] Caps indication
+  - [ ] Emphasis indication
 - [X] Add support for virtual dots
   - Virtual dots are supported and are converted to Unicode Supplementary Private Use Area-A
-- [ ] The correct, multipass and match opcodes
+- [-] The correct, multipass and match opcodes [1/3]
+  - [X] Match opcode
+    - A basic regexp engine has been implemented and aside from
+      negation the match opcode basically works
+  - [ ] Correct opcode
+  - [ ] Multipass opcode
 - [X] Currently the matching of input text against the rules is case
   sensitive.
   - [X] Make it case insensitive.
@@ -152,11 +160,8 @@ a just port of the [[https://en.wikipedia.org/wiki/Parsing_expression_grammar][P
   - However normal translation has currently no way to specify a
     display table
 - [X] Handle undefined characters similarly to liblouis
-- [ ] Use a well established FST or graph library as a bases
-  - currently regular expressions are implemented using a simple
-    directed acyclic graph. It would surely be better to use a well
-    established library for that task such as [[https://github.com/garvys-org/rustfst][rustfst]], [[https://crates.io/crates/petgraph][petgraph]] or
-    [[https://github.com/neo4j-labs/graph][graph]].
+- [ ] Instead of hand-rolling an finite state machine to implement
+  regular expressions we should use [[https://docs.rs/regex-automata/latest/regex_automata/][regex_automata]].
 
 * License