-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
This is more of a pyquery bug but I found it while using tapas-dl.
In the comments to the first installment of
https://tapas.io/series/talesofthehangman
there's a "🤩" character and something about that is messing up pyquery:
File "/Users/jake/Library/Caches/pypoetry/virtualenvs/tapas-comic-downloader-Iag5BTTj-py3.9/lib/python3.9/site-packages/pyquery/pyquery.py", line 57, in fromstring
result = getattr(etree, meth)(context)
File "src/lxml/etree.pyx", line 3254, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1913, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
File "<string>", line 2
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 2, column 1
The workaround I found is to replace pq(pageReqest.text) with
prt = "".join([x for x in pageReqest.text if ord(x) < 128])
page = pq(prt)
Metadata
Metadata
Assignees
Labels
No labels