Skip to content

Investigate rss-fetcher returning non-news URLs #44

@philbudne

Description

@philbudne

rss-fetcher output includes URLs that story-indexer regards as "non-news", both simple domain names (archive.org) and subdomains (xyz.iheart.com):

2024-08-16 18:17:28,180 c9a6a33e93c1 rss-puller INFO: non-news: http://archive.org/details/dlibra.bibliotekaelblaska.pl.92649-2.30732645
2024-08-16 18:17:26,732 c9a6a33e93c1 rss-puller INFO: non-news: https://kentuckynewsnetwork.iheart.com/content/2024-08-16-18-year-old-teen-cowboy-ace-patton-ashford-killed-in-freak-accident/
2024-08-16 18:17:24,066 c9a6a33e93c1 rss-puller INFO: non-news: https://knrs.iheart.com/content/2024-08-16-new-poll-shows-where-harris-trump-stand-in-crucial-swing-state/
2024-08-16 18:17:23,563 c9a6a33e93c1 rss-puller INFO: non-news: https://buckeyecountry105.iheart.com/content/2024-08-16-new-poll-shows-where-harris-trump-stand-in-crucial-swing-state/
2024-08-16 18:17:19,856 c9a6a33e93c1 rss-puller INFO: non-news: https://wgy.iheart.com/content/2024-08-16-boebert-bikini-photo-supporting-colleague-reveals-massive-secret-tattoo/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions