-
Notifications
You must be signed in to change notification settings - Fork 24
Arabic numerals (Draft)
Topics to talk about
Different families Origin Issues related to Bidi, numeral+signs, etc. Issues related to families (fonts, keyboards, etc.) What else...
There are mostly two families of numerals in Arabic script. One known as European digits aka as Arabic Digits (Unicode range U+0030-U+0039), the second is Arabic-Indic digits (Unicode range U+0660-U+0669). The latter is further gave another sub-notation called Persian/Urdu digits, aka Extended Arabic-Indic digits (Unicode range U+06F0-U+06F9), in which digits 4, 5 and 6 have another glyph. The following table summarizes those families
(TO DO may be put a bigger table with Unicode values/names for each character).
TODO here in some words, historical considerations about the origin of these three families and why they differ, although they have the same indian origin.
The first line above lists digits which are mostly used in North-West Africa, while the second lists digits used in most Middle-East countries. Persian (and Urdu) mostly uses the third category.
An important fact to note here is the bidirectional category of these numbers.
- Western digits (U+0030..U+0039) are of category "EN - European number",
- Arabic-digit indic (U+0660..U+0669) are "AN - Arabic number",
- Extended Arabic-digit indic U+06F0..U+06F9) are classified "EN - European number", differently from their counterpart above
The difference in bidi category between Arabic-Indic digits and Eastern Arabic-Indic digits is due to the difference in bidi behavior desired in Arabic vs. Persian.
As a consequence, a sentence like "Five is written ۵ in Iran and ٥ in Egypt", say, will give (in RTL context) :
"Five is written ۵ in Iran and ٥ in Egypt"
which may seem weird/surprising
Numerals do not always appear alone, and sometime come with other characters like financial symbols, fraction sign, decimals and/or thousands signs (excluding math expressions here). Note that there are Arabic purcent sign ٪
(U+066A), Arabic decimal separator ٫
(U+66B) and Arabic thousands separator ٬
(U+066C), mostly used with Arabic-Indics. Numerals can also come "separated" by (or mixed with) space or other signs (e.g. phone numbers +12 34 56 78 89
, cars licence plate like 123 د 4
etc.
A particular attention is needed here. Firstly, numbers have a weak directionality with regards to Bidi algorithm and secondly, the placement of the accompanying signs and symbols may depend on regions. Generally Middle East vs. North-West Africa. This is not to mention punctuation signs.
-
The percent sign is to be placed on the left after the number (٪١٢ not ١٢٪), without space (٪ ١٢). With European numbers, the % percent sign is sometime used. @@ images to put here @@
-
Arabic decimal and thousand sign obey to the same rule as for European numbers ( ١٬٢٣٤٫٥٦ ). European signs are used with European numbers (1.234,56 or 1,234.56).
indifferently the dot .
sign or the comma ,
(1,234.45 or 1.234,45) @@images here @@.
- The fractions could be written, for a one-half say, 1/2, 1\2 or 2\1
Issues :
-
How to know that a sign (space, comma ...) is a separator or a sign within a number?
12 34 56 78 90
is a phone number or a sequence of digits? Which may be inverted in RTL. A tip is to use a syntax like12.34.56.78.90
or12-34-56-78-90
for phones. -
String like licence plate above
123 د 4
would require a tag or a control character, but this not always desirable. -
etc.
Note: We do not mention math expression. See @@ elsewhere @@
Other topics to talk about: keyboards layout WRT regions, which digits are used by default in different OS/Applications
...