This project is to make contribution for automatic summarization of single document based on extractive methods. The backbone of the project is benefiting from pre-defined features of sentences and focusing on them more than word weights.
Java v8+
MySQL Server v5.2+
Java files and required external jar files are added to "src" folder. Database tables and routines can be found under "sql" folder.
During the experiments a dataset that contains news articles of BBC and their pair summaries are used to make a comparison with the generated summaries. The
context of the selected articles are about business, entertainment, politics, sport and technology.
https://www.kaggle.com/pariza/bbc-news-summary
The interface of the project provides loading selected texts and reference summaries.The generated summaries and their evaluation results are presented as output of the project. Also, the texts, their reference summaries and generated summaries are included in "experiment" folder.
Evaluation of the results are provided using ROUGE metrics and ROUGE 2.0 toolkit, which is a Java package for evaluation of the task, is externally adapted to the project.
http://rxnlp.com/rouge-2-0-usage-documentation/#.XvdylCgzY2x
- No need for semantic consideration of words, titles and calculating similarity of sentences
- Simplified calculations
- Evaluated some stop-words as bonus-words
- Used advantages of some punctuations
- Good at extracting short but valuable sentences