Skip to content

Extra compatibility handling for decoding #261

@gghh0408

Description

@gghh0408

I noticed that in actual decoding, garbled characters would appear for traditional Chinese characters.

just like =CD=A8=D6=AA=EAP=EC=B6 or =BE=A7=D4=AA=8FS=B7=BF=EAP=E9]=CA=C2=ED=97

After some modifications, I have resolved this issue. If the author thinks it's acceptable, this can be merged into the main branch.

In quoted_printable_mail_codec.dart

`/// Decodes the specified text
///
/// [part] the text part that should be decoded
/// [codec] the character encoding (charset)
/// Set [isHeader] to true to decode header text using the Q-Encoding scheme,
/// compare https://tools.ietf.org/html/rfc2047#section-4.2
@OverRide
String decodeText(
final String part,
final Encoding codec, {
bool isHeader = false,
}) {
final buffer = StringBuffer();
// remove all soft-breaks:
final cleaned = part.replaceAll('=\r\n', '');
for (var i = 0; i < cleaned.length; i++) {
final char = cleaned[i];
if (char == '=') {
final hexText = cleaned.substring(i + 1, i + 3);
var charCode = int.tryParse(hexText, radix: 16);
if (charCode == null) {
buffer.write(hexText);
} else {
final charCodes = [charCode];
while (cleaned.length > (i + 4) && cleaned[i + 3] == '=') {
i += 3;
final hexText = cleaned.substring(i + 1, i + 3);
charCode = int.parse(hexText, radix: 16);
charCodes.add(charCode);
}
//some special text,just like =CD=A8=D6=AA=EAP=EC=B6 or =BE=A7=D4=AA=8FS=B7=BF=EAP=E9]=CA=C2=ED=97
if (cleaned.length >= (i + 4)) {
String nextStr = cleaned.substring(i, i + 4);
if (nextStr.startsWith('=') && !nextStr.endsWith("=")) {
String tempStr = cleaned.substring(i + 3, i + 4);
charCode = int.tryParse(tempStr, radix: 16);
if (charCode == null) {
int asciiValue = tempStr.codeUnitAt(0);
List tempList = [charCodes.last, asciiValue];
if (isGBK(tempList)) {
charCodes.add(asciiValue);
i += 1;
}
}
}
}
try {
final decoded = codec.decode(charCodes);
buffer.write(decoded);
} on FormatException catch (err) {
print('unable to decode quotedPrintable buffer: ${err.message}');
buffer.write(String.fromCharCodes(charCodes));
}
}
i += 2;
} else if (isHeader && char == '_') {
buffer.write(' ');
} else {
buffer.write(char);
}
}

return buffer.toString();

}

bool isGBK(List bytes) {
int i = 0;
while (i < bytes.length) {
int byte = bytes[i] & 0xFF;
if (byte <= 0x7F) {
i++;
} else {
if (byte < 0x81 || byte > 0xFE) {
return false;
}
i++;
if (i >= bytes.length) {
return false;
}
int secondByte = bytes[i] & 0xFF;
if (!((secondByte >= 0x40 && secondByte <= 0x7E) || (secondByte >= 0x80 && secondByte <= 0xFE))) {
return false;
}
i++;
}
}
return true;
}
`

The main idea is to transcode the extra bit and append it to the array, then check if it is in GBK encoding format. If so, append it to the encoding array to form new text, rather than directly transcoding it which would result in garbled characters.

Thanks to the author for their selfless dedication.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions