Skip to content

Arabic numerals (Draft)

Najib Tounsi edited this page May 23, 2017 · 14 revisions

Topics to talk about

Different families
Origin
Issues related to Bidi, numeral+signs, etc.
Issues related to families (fonts, keyboards, etc.)
What else...

Arabic Numeral Writing

There are mostly two families of numerals in Arabic script. One known as European digits aka as Arabic Digits (Unicode range U+0030-U+0039), the second is Arabic-Indic Digits (Unicode range U+0660-U+0669). The latter further gave another sub-notation for Persian/Urdu digits, the Extended (or eastern) Arabic-Indic Digits (Unicode range U+06F0-U+06F9), in which digits 4, 5 and 6 have another glyph. The following table summarizes those families

Arabic Numerals

TODO here in some words, historical considerations about the origin of these three families and why they differ, although they have the same indian origin.

Digits on the first row are used in North-West Africa, while the digits on the second row are used in most Middle-East countries. Persian (and Urdu) mostly uses the third category.

Arabic numbers are written with the lowest significant digit to the right and the highest digit to the left. That arrangement is identical to the Western one, even though Arabic script is written from right to left. Numbers with many digits may use delimiters for decimal part and thousands separator (See bellow).

An important fact to note here is the bidirectional category of these numbers.

  • European Digits (U+0030..U+0039) are of category "EN - European number",
  • Arabic-Indic Digits (U+0660..U+0669) are "AN - Arabic number",
  • Extended Arabic-Indic Digits (U+06F0..U+06F9) are classified "EN - European number", differently from their counterpart just above.

The difference in bidi category between Arabic-Indic digits and Eastern Arabic-Indic digits is due to the difference in bidi behavior desired in Arabic vs. Persian. (TODO, the origine of this décision.)

As a consequence, a sentence like "Five is written ۵ in Iran and ٥ in Egypt", say, will give (in RTL context) :

"‫Five is written ۵ in Iran and ٥ in Egypt‬"

which may seem weird/surprising

Arabic Numerals in other usagas

Numbers do not always appear alone, and may come with other characters like financial symbols, fraction sign, decimals and/or thousands signs (excluding math expressions here). Note that there are Arabic purcent sign ٪ (U+066A), Arabic decimal separator ٫ (U+66B) and Arabic thousands separator ٬ (U+066C), mostly used with Arabic-Indics. Numerals can also come "separated" by (or mixed with) space or other signs (e.g. phone numbers +12 34 56 78 89, cars licence plate like 123 د‎ 4 etc.

A particular attention is needed here. Firstly, numbers have a weak directionality with regards to Bidi algorithm and secondly, the placement of the accompanying signs and symbols may depend on regions. Generally Middle East vs. North-West Africa. This is not to mention punctuation signs.

  • The percent sign is to be placed on the left after the number (٪١٢ not ١٢٪), without space (٪ ١٢). With European numbers, the % percent sign is sometime used. @@ images to put here @@

  • Arabic decimal and thousand sign obey to the same rule as for European numbers ( ١٬٢٣٤٫٥٦ ). European signs are used with European numbers (1.234,56 or 1,234.56).

  • The fractions could be written, for a one-half say, @@ 1/2 or 2/1 @@

Issues :

  • How to know that a sign (space, comma ...) is a separator or a sign within a number? 12 34 56 78 90 is a phone number or a sequence of digits? Which may be inverted in RTL. A tip is to use a syntax like 12.34.56.78.90 or 12-34-56-78-90 for phones.

  • String like licence plate above 123 د‎ 4 would require a tag or a control character, but this not always desirable.

  • etc.

Note: We do not mention math expression. @@ See elsewhere @@

Other topics to talk about: keyboards layout WRT regions, which digits are used by default in different OS/Applications

...

Clone this wiki locally