Skip to content

Commit 28c1dd0

Browse files
[DowCrawler] Default to UTF-8 when possible
1 parent 407a9b9 commit 28c1dd0

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

Crawler.php

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,8 @@ public function add($node)
127127
/**
128128
* Adds HTML/XML content.
129129
*
130-
* If the charset is not set via the content type, it is assumed
131-
* to be ISO-8859-1, which is the default charset defined by the
130+
* If the charset is not set via the content type, it is assumed to be UTF-8,
131+
* or ISO-8859-1 as a fallback, which is the default charset defined by the
132132
* HTTP 1.1 specification.
133133
*
134134
* @param string $content A string to parse as HTML/XML
@@ -161,7 +161,7 @@ public function addContent($content, $type = null)
161161
}
162162

163163
if (null === $charset) {
164-
$charset = 'ISO-8859-1';
164+
$charset = preg_match('//u', $content) ? 'UTF-8' : 'ISO-8859-1';
165165
}
166166

167167
if ('x' === $xmlMatches[1]) {

0 commit comments

Comments
 (0)