You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.org
+15-10Lines changed: 15 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ using only character and most translation opcodes basically works.
17
17
The original YAML test suite is supported and can be used to test the
18
18
re-implementation.
19
19
20
-
Currently, the re-implementation passes 68% of the liblouis test suite
20
+
Currently, the re-implementation passes 83% of the liblouis test suite
21
21
successfully.
22
22
23
23
* Relation to liblouis
@@ -111,22 +111,30 @@ The parser is built from the grammar used in [[https://github.com/liblouis/tree-
111
111
which is a port of the [[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form][EBNF grammar]] in [[https://github.com/liblouis/rewrite-louis][rewrite-louis]], which in turn is
112
112
a just port of the [[https://en.wikipedia.org/wiki/Parsing_expression_grammar][Parsing expression grammar]] from [[https://github.com/liblouis/louis-parser][louis-parser]].
113
113
114
-
* Todo [6/15]
115
-
- [] Parse with context
114
+
* Todo [7/15]
115
+
- [X] Parse with context
116
116
- currently tables are parsed line by line. Opcodes have no idea
117
117
whether a character or a class has been defined before
118
118
- Probably need to pass some context to the rule parser where
119
119
character definitions and class names are kept
120
120
- this is solved with a two-pass compilation now. The first pass
121
121
collects all relevant information and the second pass consequently
122
122
uses that.
123
-
- [ ] (Emphasis and Caps) Indication
123
+
- [-] Indication [2/3]
124
124
- presumably this could be done independently of translation, i.e.
125
125
find indication locations and put them in the typeform array
126
126
before even translating.
127
+
- [X] Numeric indication
128
+
- [X] Caps indication
129
+
- [ ] Emphasis indication
127
130
- [X] Add support for virtual dots
128
131
- Virtual dots are supported and are converted to Unicode Supplementary Private Use Area-A
129
-
- [ ] The correct, multipass and match opcodes
132
+
- [-] The correct, multipass and match opcodes [1/3]
133
+
- [X] Match opcode
134
+
- A basic regexp engine has been implemented and aside from
135
+
negation the match opcode basically works
136
+
- [ ] Correct opcode
137
+
- [ ] Multipass opcode
130
138
- [X] Currently the matching of input text against the rules is case
131
139
sensitive.
132
140
- [X] Make it case insensitive.
@@ -152,11 +160,8 @@ a just port of the [[https://en.wikipedia.org/wiki/Parsing_expression_grammar][P
152
160
- However normal translation has currently no way to specify a
153
161
display table
154
162
- [X] Handle undefined characters similarly to liblouis
155
-
- [ ] Use a well established FST or graph library as a bases
156
-
- currently regular expressions are implemented using a simple
157
-
directed acyclic graph. It would surely be better to use a well
158
-
established library for that task such as [[https://github.com/garvys-org/rustfst][rustfst]], [[https://crates.io/crates/petgraph][petgraph]] or
159
-
[[https://github.com/neo4j-labs/graph][graph]].
163
+
- [ ] Instead of hand-rolling an finite state machine to implement
164
+
regular expressions we should use [[https://docs.rs/regex-automata/latest/regex_automata/][regex_automata]].
0 commit comments