Skip to content
This repository was archived by the owner on Aug 10, 2024. It is now read-only.
This repository was archived by the owner on Aug 10, 2024. It is now read-only.

RSS feed broken if CDATA contains lower ascii characters #33

@relikd

Description

@relikd

Hi there,

I just stumbled upon a feed that uses chars in the range \0x01 - \0x1F (CDATA description).
Although libxml2 isn't supposed to handle this, RSParser will break early and drop the remaining feed articles. When parsing the RSS below, only the first two items will be returned.

It should be enough to regex and replace these, however, I was wondering if there is a libxml2 flag that could be used instead…

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
	<title>Feed Title</title>
<item>
		<title>1</title>
		<link>http://someurl.com/1/</link>
		<description><![CDATA[Description of first]]></description>
</item>
<item>
		<title>2</title>
		<link>http://someurl.com/2/</link>
		<description><![CDATA[Description with � \0x04 values]]></description>
</item>
<item>
		<title>3</title>
		<link>http://someurl.com/3/</link>
		<description><![CDATA[Description of third]]></description>
</item>
<item>
		<title>4</title>
		<link>http://someurl.com/4/</link>
		<description><![CDATA[Description of fourth]]></description>
</item>
<item>
		<title>5</title>
		<link>http://someurl.com/5/</link>
		<description><![CDATA[Description of fifth]]></description>
</item>
	</channel>
</rss>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions