Skip to content

rss格式问题 #28

@H4lo

Description

@H4lo

yarb.py中,在parseThread中解析rss xml的内容时,有些updated_parsed字段会放在feed块中,而不在entries中,就会报错:


'entries': [
]

...


'feed': {
        'title': 'Talkback Tech',
        'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'Talkback Tech'},
        'links': [
            {'rel': 'alternate', 'type': 'text/html', 'href': 'https://talkback.sh/tech/feed/'},
            {'href': 'https://talkback.sh/tech/feed/', 'rel': 'self', 'type': 'application/atom+xml'}
        ],
        'link': 'https://talkback.sh/tech/feed/',
        'subtitle': 'Latest technical resources on Talkback',
        'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Latest technical resources on Talkback'},
        'language': 'en-us',
        'updated': 'Mon, 05 Aug 2024 03:08:08 +0000',
        'updated_parsed': time.struct_time(tm_year=2024, tm_mon=8, tm_mday=5, tm_hour=3, tm_min=8, tm_sec=8, tm_wday=0, tm_yday=218, tm_isdst=0)
    },

这里加上对d变量的检查,将d变量从feed块中取。
同时有些rss订阅只会有当天发布的链接,这里将当天和昨天发布的链接都放在一起防止抓不到当天的订阅内容:

...
        for entry in r.entries:
            d = entry.get('published_parsed') or entry.get('updated_parsed')

+            if(not d):
+               d = (r.feed.updated_parsed)
            yesterday = datetime.date.today()# + datetime.timedelta(-1)
            pubday = datetime.date(d[0], d[1], d[2])
-            if (pubday == yesterday) and filter(entry.title):
+           if (pubday == yesterday or datetime.date.today()+datetime.timedelta(-1) == pubday) and filter(entry.title):
                item = {entry.title: entry.link}
                # print(item)
                result |= item

...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions