Skip to content

santhoshse7en/news-fetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI version License Documentation Status Downloads

📰 news-fetch

news-fetch is an open-source, easy-to-use news crawler that extracts structured information from almost any news website 🌐. It can recursively follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles 📚. You only need to provide the root URL of the news website to crawl it completely 🔍. News-fetch combines the power of multiple state-of-the-art libraries and tools, including news-please by Felix Hamborg and Newspaper3K by Lucas (欧阳象) Ou-Yang. This package leverages features from both of these works 🤖.

I built this tool to minimize NaN or empty values when scraping data from various news websites 🚀. It's platform-independent and written in Python 3, making it easy for programmers and developers to access news data for their applications 💻.


🔗 Project Links

Source Link
PyPI: https://pypi.org/project/news-fetch/
Repository: https://santhoshse7en.github.io/news-fetch/
Documentation: https://santhoshse7en.github.io/news-fetch_doc/ (Not Yet Created!)

📦 Dependencies

📝 Extracted Information

news-fetch extracts the following attributes from news articles. You can also check out an example JSON file generated by news-please.

  • 📰 Headline
  • ✍️ Author(s)
  • 📅 Publication date
  • 🗞️ Publication
  • 📂 Category
  • 🌍 Source domain
  • 📑 Article content
  • 📝 Summary
  • 🔑 Keywords
  • 🌐 URL
  • 🌐 Language

🔧 Dependency Installation

Use the package manager pip to install the required dependencies:

pip install -r requirements.txt

🚀 Usage

You can download it by clicking the green download button on Github.

To scrape all the news details, use the newspaper function:

from newsfetch.news import Newspaper

news = Newspaper(url='https://www.thehindu.com/news/cities/Madurai/aa-plays-a-pivotal-role-in-helping-people-escape-from-the-grip-of-alcoholism/article67716206.ece')
print(news.headline)
# Output: 'AA plays a pivotal role in helping people escape from the grip of alcoholism'

To extract URLs from a targeted website, call the GoogleSearchNewsURLExtractor by providing the keyword and newspaper link as arguments:

from newsfetch.google import GoogleSearchNewsURLExtractor

google = GoogleSearchNewsURLExtractor(keyword='Alcoholics Anonymous', news_domain='https://timesofindia.indiatimes.com/')
print(google.urls)
"""
['https://timesofindia.indiatimes.com/city/pune/pune-takes-a-stand-against-alcoholism-experts-collaborate-with-alcoholics-anonymous/articleshow/114438466.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/we-have-lost-jobs-homes-alcoholics-anonymous/articleshow/96824383.cms', 
'https://timesofindia.indiatimes.com/city/gurgaon/gurgaons-alcoholics-open-up-about-their-road-to-recovery/articleshow/45080744.cms', 
'https://timesofindia.indiatimes.com/city/goa/alcoholism-is-illness-not-issue-of-weak-willpower-say-experts/articleshow/105320008.cms', 
'https://timesofindia.indiatimes.com/city/bhopal/alcoholism-is-an-illness-bhopal-aa-silver-jubilee-celebration/articleshow/106849014.cms', 
'https://timesofindia.indiatimes.com/city/ahmedabad/alcoholics-anonymous-switches-to-online-sessions/articleshow/76144639.cms', 
'https://timesofindia.indiatimes.com/city/kochi/keralites-trying-to-kick-alcoholism-alcoholics-anonymous/articleshow/13977818.cms', 
'https://timesofindia.indiatimes.com/city/chandigarh/alcoholics-anonymous-turned-their-lives-around/articleshow/18239.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/like-air-india-flyer-alcoholics-anonymous-members-reap-whirlwind-of-job-loss-broken-homes/articleshow/96820403.cms', 
'https://timesofindia.indiatimes.com/city/nagpur/alcoholics-anonymous-meet-promotes-one-day-at-a-time/articleshow/50538092.cms']
"""

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

Make sure to update tests as appropriate.

📄 License

This project is licensed under the MIT License.

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages