save_stories_from_feed performance improvement

I was reading the code save_stories_from_feed in tasks.py and it looks to be making one database call per feed entry to check for duplicates.

normalized_url_exists could be replaced by a single call to the database to check all feed entries at once.

There could a function call getValidFeedEntries that would apply the logic existing in save_stories_from_feed that skips invalid entries.

Then a single database call to identify what is duplicate and then bulk insert and commit.

If it sounds reasonable I can give it a try.  This looks to be the eventual bottleneck of this implementation?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

save_stories_from_feed performance improvement #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

save_stories_from_feed performance improvement #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions