DistilBertSumm

This code is entirely based on the work of @nlpyang (https://github.com/nlpyang/PreSumm) which is based on EMNLP 2019 paper Text Summarization with Pretrained Encoders

Presumm, the original project is based on building summarization models using pretrained encoders, BERT. I was interested in this work in a way of building lighter versions of those models based on distilbert.

My first experiment was to build an extractive summarization model half the size of the orginal BertSumExt. Despite the drastic reduction of size, I reached the same results as BertSumExt (losing 1 point at most).

My next experiments will be to build an abstractive summarizartion model and try to achieve state of the art results.

Results on CNN/DailyMail (20/8/2019):

Models	ROUGE-1	ROUGE-2	ROUGE-L
Extractive
TransformerExt	40.90	18.02	37.17
BertSumExt	43.23	20.24	39.63
BertSumExt (large)	43.85	20.34	39.90
Abstractive
TransformerAbs	40.21	17.76	37.09
BertSumAbs	41.72	19.39	38.76
BertSumExtAbs	42.13	19.60	39.18
My experiments
DistilBertSumExt (mine)	42.74	19.98	39.22
DistilBertSumExtAbs (mine)	--	--	--
DistilBertSumAbs (mine)	--	--	--

Python version: This code is in Python3.6

Package Requirements: torch==1.1.0 pytorch_transformers tensorboardX multiprocess pyrouge

Updates: For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

Some codes are borrowed from ONMT(https://github.com/OpenNMT/OpenNMT-py)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bert_data		bert_data
events		events
json_data		json_data
raw_data		raw_data
results		results
src		src
urls		urls
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DistilBertSumm

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

steph1793/DistilBertAbs

Folders and files

Latest commit

History

Repository files navigation

DistilBertSumm

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages