Skip to content

Commit 25edbe8

Browse files
committed
utf8_to_uv_msgs: Move premature setting to later
This commit moves the final two cases of setting up the return to be the REPLACEMENT_CHARACTER to later in the code, where all such malformations are handled. This makes the handling uniform for a bunch of cases, which will enable a future commit to combine them.
1 parent c2009f1 commit 25edbe8

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

utf8.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1686,7 +1686,6 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
16861686
if (UNLIKELY(curlen <= 0)) {
16871687
possible_problems |= UTF8_GOT_EMPTY;
16881688
curlen = 0;
1689-
uv = UNICODE_REPLACEMENT;
16901689
goto ready_to_handle_errors;
16911690
}
16921691

@@ -1709,7 +1708,6 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
17091708
if (UNLIKELY(UTF8_IS_CONTINUATION(*s0))) {
17101709
possible_problems |= UTF8_GOT_CONTINUATION;
17111710
curlen = 1;
1712-
uv = UNICODE_REPLACEMENT;
17131711
goto ready_to_handle_errors;
17141712
}
17151713

@@ -1897,11 +1895,11 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
18971895
* input by for the next call to this function.
18981896
* possible_problems is 0 if there weren't any problems; otherwise a bit
18991897
* is set in it for each potential problem found.
1900-
* uv contains the code point the input sequence
1901-
* represents; or if there is a problem that prevents
1902-
* a well-defined value from being computed, it is
1903-
* some substitute value, typically the REPLACEMENT
1904-
* CHARACTER.
1898+
* uv contains the value of the code point the input
1899+
* sequence represents, as far as we were able to
1900+
* determine. This is the correct translation of the
1901+
* input bytes if and only if no malformations were
1902+
* encountered.
19051903
* s points to just after where we left off processing
19061904
* the character
19071905
* send points to just after where that character should
@@ -2056,6 +2054,7 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
20562054
break;
20572055

20582056
case UTF8_GOT_EMPTY:
2057+
uv = UNICODE_REPLACEMENT;
20592058
if (! (flags & UTF8_ALLOW_EMPTY)) {
20602059

20612060
/* This so-called malformation is now treated as a bug in
@@ -2073,6 +2072,7 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
20732072
break;
20742073

20752074
case UTF8_GOT_CONTINUATION:
2075+
uv = UNICODE_REPLACEMENT;
20762076
if (! (flags & UTF8_ALLOW_CONTINUATION)) {
20772077
disallowed = TRUE;
20782078
if (NEED_MESSAGE(WARN_UTF8,,)) {

0 commit comments

Comments
 (0)