This project fetches the latest world news articles from Yahoo! Japan, translates their content from Japanese to English using Google Cloud Translate API, and summarizes the key entities using Google Cloud Natural Language API. The results can be saved as a JSON file or printed to the console.
- Scrapes latest world news topics from Yahoo! Japan
- Extracts article content from each news article URL
- Translates content from Japanese to English using Google Cloud Translation API
- Extracts key entities for summarization using Google Cloud Natural Language API
- Supports output to JSON file or standard output (console) using
--output
option
.
โโโ summaries.json (optional)
โโโ scripts/
โ โโโ main.py
โ โโโ news_scraper.py
โ โโโ summarizer.py
โ โโโ translator.py
- Python 3.8+
google-cloud-translate
google-cloud-language
beautifulsoup4
requests
Install dependencies:
pip install -r requirements.txt
Set up Google Cloud credentials:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"
python scripts/main.py --output json # Saves summaries to summaries.json
python scripts/main.py --output console # Prints summaries to standard output
It will:
- Fetch recent article titles and URLs
- Extract their content
- Translate to English
- Summarize key entities
- Output results based on
--output
selection
[
{
"title": "Some Japanese News Title",
"original": "ๅ
ใฎๆ็ซ ...",
"translated": "The original article translated to English...",
"summary": "Keywords: Prime Minister, Japan, Trade, Policy"
}
]
- Yahoo! Japan articles may require adjustments due to layout changes.
- You can switch sources or languages by modifying
news_scraper.py
andtranslator.py
. - The
--output
flag helps choose between file output and terminal display.
[ty70]