This code is entirely based on the work of @nlpyang (https://github.com/nlpyang/PreSumm) which is based on EMNLP 2019 paper Text Summarization with Pretrained Encoders
Presumm, the original project is based on building summarization models using pretrained encoders, BERT. I was interested in this work in a way of building lighter versions of those models based on distilbert.
My first experiment was to build an extractive summarization model half the size of the orginal BertSumExt. Despite the drastic reduction of size, I reached the same results as BertSumExt (losing 1 point at most).
My next experiments will be to build an abstractive summarizartion model and try to achieve state of the art results.
Results on CNN/DailyMail (20/8/2019):
Models | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Extractive | |||
TransformerExt | 40.90 | 18.02 | 37.17 |
BertSumExt | 43.23 | 20.24 | 39.63 |
BertSumExt (large) | 43.85 | 20.34 | 39.90 |
Abstractive | |||
TransformerAbs | 40.21 | 17.76 | 37.09 |
BertSumAbs | 41.72 | 19.39 | 38.76 |
BertSumExtAbs | 42.13 | 19.60 | 39.18 |
My experiments | |||
DistilBertSumExt (mine) | 42.74 | 19.98 | 39.22 |
DistilBertSumExtAbs (mine) | -- | -- | -- |
DistilBertSumAbs (mine) | -- | -- | -- |
Python version: This code is in Python3.6
Package Requirements: torch==1.1.0 pytorch_transformers tensorboardX multiprocess pyrouge
Updates: For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.
Some codes are borrowed from ONMT(https://github.com/OpenNMT/OpenNMT-py)