Skip to content

Commit 3e701e9

Browse files
authored
Merge pull request #102 from yuryleb/grammar
Grammar support for Russian way names
2 parents badb219 + 044836a commit 3e701e9

File tree

9 files changed

+1293
-82
lines changed

9 files changed

+1293
-82
lines changed

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# Change Log
22
All notable changes to this project will be documented in this file. For change log formatting, see http://keepachangelog.com/
33

4+
## master
5+
6+
- Added grammatical cases support for Russian way names [#102](https://github.com/Project-OSRM/osrm-text-instructions/pull/102)
7+
48
## 0.7.1 2017-09-26
59

610
- Added Castilian Spanish localization. [#163](https://github.com/Project-OSRM/osrm-text-instructions/pull/163)
@@ -73,7 +77,7 @@ All notable changes to this project will be documented in this file. For change
7377

7478
## 0.1.0 2016-11-17
7579

76-
- Improve chinese translation
80+
- Improve Chinese translation
7781
- Standardize capitalizeFirstLetter meta key
7882
- Change instructions object customization to options.hooks.tokenizedInstruction
7983

Grammar.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
## Grammar support
2+
3+
Many languages - all Slavic (Russian, Ukrainian, Polish, Bulgarian, etc), Finnic (Finnish, Estonian) and others - have [grammatical case feature](https://en.wikipedia.org/wiki/Grammatical_case) that could be supported in OSRM Text Instructions too.
4+
Originally street names are being inserted into instructions as they're in OSM map - in [nominative case](https://en.wikipedia.org/wiki/Nominative_case).
5+
To be grammatically correct, street names should be changed according to target language rules and instruction context before insertion.
6+
7+
Actually grammatical case applying is not the simple and obvious task due to real-life languages complexity.
8+
It even looks so hard so, for example, all known native Russian navigation systems don't speak street names in their pronounceable route instructions at all.
9+
10+
But fortunately street names have restricted lexicon and naming rules and so this task could be relatively easily solved for this particular case.
11+
12+
### Implementation details
13+
14+
The quite universal and simplier solution is the changing street names with the prepared set of regular expressions grouped by required grammatical case.
15+
The required grammatical case should be specified right in instruction's substitution variables:
16+
17+
- `{way_name}` and `{rotary_name}` variables in translated instructions should be appended with required grammar case name after colon: `{way_name:accusative}` for example
18+
- [languages/grammar](languages/grammar/) folder should contain language-specific JSON file with regular expressions for specified grammar case:
19+
```json
20+
{
21+
"v5": {
22+
"accusative": [
23+
["^ (\\S+)ая-(\\S+)ая [Уу]лица ", " $1ую-$2ую улицу "],
24+
["^ (\\S+)ая [Уу]лица ", " $1ую улицу "],
25+
...
26+
```
27+
- All such JSON files should be registered in common [languages.js](languages.js)
28+
- Instruction text formatter ([index.js](index.js) in this module) should:
29+
- check `{way_name}` and `{rotary_name}` variables for optional grammar case after colon: `{way_name:accusative}`
30+
- find appropriate regular expressions block for target language and specified grammar case
31+
- call standard [string replace with regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) for each expression in block passing result from previous call to the next; the first call should enclose original street name with whitespaces to make parsing words in names a bit simplier.
32+
- Strings replacement with regular expression is available in almost all other programming language and so this should not be the problem for other code used OSRM Text Instructions' data only.
33+
- If there is no regular expression matched source name (that's for names from foreign country for example), original name is returned without changes. This is also expected behavior of standard [string replace with regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace). And the same behavior is expected in case of missing grammar JSON file or grammar case inside it.
34+
35+
### Example
36+
37+
Russian _"Большая Монетная улица"_ street from St Petersburg (_Big Monetary Street_ in rough translation) after processing with [Russian grammar rules](languages/grammar/ru.json) will look in following instructions as:
38+
- _"Turn left onto `{way_name}`"_ => `ru`:_"Поверните налево на `{way_name:accusative}`"_ => _"Поверните налево на Большую Монетную улицу"_
39+
- _"Continue onto `{way_name}`"_ => `ru`:_"Продолжите движение по `{way_name:dative}`"_ => _"Продолжите движение по Большой Монетной улице"_
40+
- _"Make a U-turn onto `{way_name}` at the end of the road"_ => `ru`:_"Развернитесь в конце `{way_name:genitive}`"_ => _"Развернитесь в конце Большой Монетной улицы"_
41+
- _"Make a U-turn onto `{way_name}`"_ => `ru`:_"Развернитесь на `{way_name:prepositional}`"_ => _"Развернитесь на Большой Монетной улице"_
42+
43+
### Design goals
44+
45+
- __Cross platform__ - uses the same data-driven approach as OSRM Text Instructions
46+
- __Test suite__ - has [prepared test](test/grammar_tests.js) to check available expressions automatically and has easily extendable language-specific names testing pattern
47+
- __Customization__ - could be easily extended for other languages with adding new regular expressions blocks into [grammar support](languages/grammar/) folder and modifying `{way_name}` and other variables in translated instructions only with necessary grammatical case labels
48+
49+
### Notes
50+
51+
- Russian regular expressions are based on [Garmin Russian TTS voices update](https://github.com/yuryleb/garmin-russian-tts-voices) project; see [file with regular expressions to apply to source text before pronouncing by TTS](https://github.com/yuryleb/garmin-russian-tts-voices/blob/master/src/Pycckuu__Milena%202.10/RULESET.TXT).
52+
- There is another grammar-supporting module - [jquery.i18n](https://github.com/wikimedia/jquery.i18n) - but unfortunately it has very poor implementation in part of grammatical case applying and is supposed to work with single words only.
53+
- Actually it would be great to get street names also in target language not from default OSM `name` only - there are several multi-lingual countries supporting several `name:<lang>` names for streets. But this the subject to address to [OSRM engine](https://github.com/Project-OSRM/osrm-backend) first.

Readme.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ OSRM Text Instructions transforms [OSRM](http://www.project-osrm.org/) route res
88

99
OSRM Text Instructions has been translated into [several languages](https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/translations/). Please help us add support for the languages you speak [using Transifex](https://www.transifex.com/project-osrm/osrm-text-instructions/).
1010

11+
OSRM Text Instructions could support [grammatical cases](https://github.com/Project-OSRM/osrm-text-instructions/tree/master/Grammar.md) for street names for [some languages](https://github.com/Project-OSRM/osrm-text-instructions/tree/languages/grammar/).
12+
13+
Grammatical cases and other translated strings customization after [Transifex](https://www.transifex.com/project-osrm/osrm-text-instructions/) is handled by [override scripts](https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/overrides/).
14+
1115
[![NPM](https://nodei.co/npm/osrm-text-instructions.png)](https://npmjs.org/package/osrm-text-instructions/)
1216

1317
### Design goals

index.js

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
var languages = require('./languages');
22
var instructions = languages.instructions;
3+
var grammars = languages.grammars;
34

45
module.exports = function(version, _options) {
56
var opts = {};
@@ -104,7 +105,6 @@ module.exports = function(version, _options) {
104105
switch (type) {
105106
case 'use lane':
106107
laneInstruction = instructions[language][version].constants.lanes[this.laneConfig(step)];
107-
108108
if (!laneInstruction) {
109109
// If the lane combination is not found, default to continue straight
110110
instructionObject = instructions[language][version]['use lane'].no_lanes;
@@ -199,10 +199,37 @@ module.exports = function(version, _options) {
199199

200200
return this.tokenize(language, instruction, replaceTokens);
201201
},
202+
grammarize: function(language, name, grammar) {
203+
// Process way/rotary name with applying grammar rules if any
204+
if (name && grammar && grammars && grammars[language] && grammars[language][version]) {
205+
var rules = grammars[language][version][grammar];
206+
if (rules) {
207+
// Pass original name to rules' regular expressions enclosed with spaces for simplier parsing
208+
var n = ' ' + name + ' ';
209+
var flags = grammars[language].meta.regExpFlags || '';
210+
rules.forEach(function(rule) {
211+
var re = new RegExp(rule[0], flags);
212+
n = n.replace(re, rule[1]);
213+
});
214+
215+
return n.trim();
216+
}
217+
}
218+
219+
return name;
220+
},
202221
tokenize: function(language, instruction, tokens) {
203-
var output = Object.keys(tokens).reduce(function(memo, token) {
204-
return memo.replace('{' + token + '}', tokens[token]);
205-
}, instruction)
222+
// Keep this function context to use in inline function below (no arrow functions in ES4)
223+
var that = this;
224+
var output = instruction.replace(/\{(\w+):?(\w+)?\}/g, function(token, tag, grammar) {
225+
var name = tokens[tag];
226+
if (typeof name !== 'undefined') {
227+
return that.grammarize(language, name, grammar);
228+
}
229+
230+
// Return unknown token unchanged
231+
return token;
232+
})
206233
.replace(/ {2}/g, ' '); // remove excess spaces
207234

208235
if (instructions[language].meta.capitalizeFirstLetter) {

languages.js

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Load all language files excplicitely to allow integration
1+
// Load all language files explicitly to allow integration
22
// with bundling tools like webpack and browserify
33
var instructionsDe = require('./languages/translations/de.json');
44
var instructionsEn = require('./languages/translations/en.json');
@@ -19,6 +19,8 @@ var instructionsUk = require('./languages/translations/uk.json');
1919
var instructionsVi = require('./languages/translations/vi.json');
2020
var instructionsZhHans = require('./languages/translations/zh-Hans.json');
2121

22+
// Load all grammar files
23+
var grammarRu = require('./languages/grammar/ru.json');
2224

2325
// Create a list of supported codes
2426
var instructions = {
@@ -42,7 +44,13 @@ var instructions = {
4244
'zh-Hans': instructionsZhHans
4345
};
4446

47+
// Create list of supported grammar
48+
var grammars = {
49+
'ru': grammarRu
50+
};
51+
4552
module.exports = {
4653
supportedCodes: Object.keys(instructions),
47-
instructions: instructions
54+
instructions: instructions,
55+
grammars: grammars
4856
};

0 commit comments

Comments
 (0)