Skip to content

Commit ab01694

Browse files
committed
Unicode:UCD Clarify pod for num()
Add example of how it handles numbers that aren't decimal positional. I thought it would clarify things to expand and correct the flawed example pointed out in GH #23003.
1 parent b20c83a commit ab01694

File tree

6 files changed

+30
-6
lines changed

6 files changed

+30
-6
lines changed

charclass_invlists.inc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -456662,7 +456662,7 @@ static const U8 WB_dfa_table[] = {
456662456662
#endif /* defined(PERL_IN_REGEXEC_C) */
456663456663

456664456664
/* Generated from:
456665-
* 7229a97216f54f7d47d5cff56fc8dbc185dcfe40db20533f8034a1215af787fe lib/Unicode/UCD.pm
456665+
* b7f46fc1010fd83f5a678b268a23fef0142a18d0ab2a142edd0bb03328e667c3 lib/Unicode/UCD.pm
456666456666
* 764f420cedfc8b43d9fec251c957a5d55fc45d40f6573f162990ed1dce7e36e0 lib/unicore/ArabicShaping.txt
456667456667
* b8f32554c6f658821fb0ee742d21c5b1f2086b9bf13071fed04894b022f93d67 lib/unicore/BidiBrackets.txt
456668456668
* d7afdadd1bbd66f5a663ac0e8f7958f18fd9491fc0bc59ec5877cb82db71db7d lib/unicore/BidiMirroring.txt

lib/Unicode/UCD.pm

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ use warnings;
55
no warnings 'surrogate'; # surrogates can be inputs to this
66
use charnames ();
77

8-
our $VERSION = '0.80';
8+
our $VERSION = '0.81';
99

1010
sub DEBUG () { 0 }
1111
$|=1 if DEBUG;
@@ -2496,6 +2496,30 @@ match them. A single-character string containing one of these digits will
24962496
have its decimal value returned by C<num>, but any longer string containing
24972497
only these digits will return C<undef>.
24982498
2499+
To illustrate further, the Rumi numeric symbols were used in centuries past in
2500+
and around North Africa and the Iberian peninsula. In order to be able to
2501+
digitize the many historical documents that use them, Unicode has encoded the
2502+
set. There is no character representing zero. There are characters for one
2503+
through nine, ten, twenty, and so forth. C<num> correctly returns the values
2504+
of these in isolation.
2505+
2506+
my $rumi_one = num("\N{RUMI DIGIT ONE}");
2507+
my $rumi_two = num("\N{RUMI DIGIT TWO}");
2508+
my $rumi_twenty = num("\N{RUMI NUMBER TWENTY}");
2509+
say "$rumi_one $rumi_two $rumi_twenty"; # 1 2 20
2510+
2511+
Because these do not follow modern decimal positional notation, stringing more
2512+
than one of these together doesn't mean what you likely would think it means.
2513+
So, C<num> correctly returns C<undef> if you try. If you request the length
2514+
of the valid initial substring in this case, that length would be one.
2515+
2516+
my $len;
2517+
my $value = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}", \$len);
2518+
say $len, " ", (defined $value) ? $value : "undef"; # 1 undef
2519+
2520+
How to represent numbers like twelve gets complicated, and Unicode doesn't
2521+
give any guidance, so C<num> can't either.
2522+
24992523
Strings of multiple sub- and superscripts are not recognized as numbers. You
25002524
can use either of the compatibility decompositions in Unicode::Normalize to
25012525
change these into digits, and then call C<num> on the result.

lib/unicore/uni_keywords.pl

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

regcharclass.h

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

regexp_constants.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
#define MAX_FOLD_FROMS 3
3030

3131
/* Generated from:
32-
* 7229a97216f54f7d47d5cff56fc8dbc185dcfe40db20533f8034a1215af787fe lib/Unicode/UCD.pm
32+
* b7f46fc1010fd83f5a678b268a23fef0142a18d0ab2a142edd0bb03328e667c3 lib/Unicode/UCD.pm
3333
* 764f420cedfc8b43d9fec251c957a5d55fc45d40f6573f162990ed1dce7e36e0 lib/unicore/ArabicShaping.txt
3434
* b8f32554c6f658821fb0ee742d21c5b1f2086b9bf13071fed04894b022f93d67 lib/unicore/BidiBrackets.txt
3535
* d7afdadd1bbd66f5a663ac0e8f7958f18fd9491fc0bc59ec5877cb82db71db7d lib/unicore/BidiMirroring.txt

uni_keywords.h

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)