The Text Summarization Google extension simplifies online reading by condensing articles into concise summaries. With an intuitive interface featuring a text area and 'Summary' button, it efficiently processes news articles, providing quick access to key information. The extension works by reading the current website's URL, extracting content via a server, and summarizing it using the PEGASUS_X model. In my work, I implemented abstractive text summarization techniques based on two papers: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization and Investigating Efficiently Extending Transformers for Long Input Summarization. Specifically, I will utilize the pre-training objectives (GSG) proposed by PEGASUS for the pre-training phase, and I will incorporate the improvements in network architectures and training strategies for long text summarization.
- CUDA/CUDNN
- Python3
- Packages found in requirements.txt
- git clone https://gitlab.com/DuongKien2001/summary_page
- cd summary_page
- pip install -r requirements.txt
python model/prepare_data.py
python model/train.py --src_len 512 --tgt_len 256 --epochs 6
python model/train.py --src_len 6400 --tgt_len 256 --epochs 3 --no-pretrain -r checkpoint.pth
python model/evaluatePegasusX.py --start_idx 0 --end_idx 7000
python backend/main.py
Pretrained model for PubMed can be downloaded at: (Link), and should be unzipped in the 'model/checkpoint/finetune' folder.
- Go to a webpage that has an article you want to shorten.
- Click on the extension icon or use the keyboard shortcut to turn on the extension.
- On the extension, click the 'Summary' button.
- Read the brief summary.