Translating HuggingFace Daily Papers with InternLM
This project automatically downloads and processes HuggingFace daily paper data and translates it into multiple languages using the InternLM large language model. The project runs automatically every day to ensure timely retrieval and translation of the latest papers.
- Translation Model: InternLM-3
- Developer: Shanghai AI Laboratory
- Version: internlm3-latest
- Features:
- Powerful multilingual translation capabilities
- Accurate understanding and translation of academic texts
- Real-time translation via API
- Automatic download of HuggingFace daily paper data
- Support for downloading historical data from specific dates
- Use of Beijing time as default timezone
- Complete activity logging
- JSON format paper metadata storage
- Translation of English papers to multiple languages using InternLM-3:
- Japanese
- Korean
- Spanish
- French
- Automated workflow:
- Daily automatic download of latest papers
- Automatic multilingual translation
- Automatic repository updates
- Clone the repository:
git clone https://github.com/yourusername/hf-daily-paper-newsletter-multilingual.git
cd hf-daily-paper-newsletter-multilingual
- Install dependencies:
pip install -r requirements.txt
python download_papers.py
python download_papers.py --date 2024-03-20
First obtain an InternLM API key, then run:
python translate_papers.py --date 2024-03-20 --api_key your_api_key_here
The project is configured with two GitHub Actions workflows:
daily-paper-download.yml
: Automatically downloads latest papers at 9:00 AM Beijing timedaily-paper-translate.yml
: Automatic translation after download
To enable automatic translation, you need to set INTERNLM_API_KEY
in the repository's Secrets.
- Original English paper data is stored in the
Paper_metadata_download
directory - Translated papers are stored in the
Translated_papers
directory, organized by language code:- ja/: Japanese translations
- ko/: Korean translations
- es/: Spanish translations
- fr/: French translations
- All files are saved in JSON format with names in
YYYY-MM-DD.json
format
- Success: exit code 0
- Error: exit code 1
- No data: exit code 0 (with warning in log)
- InternLM - For providing powerful translation capabilities
- HuggingFace - For providing daily paper data