Skip to content

Hacettepe-University-CMP681-2020-Spring/ir-project-ir-term-project-pelin-kocyigit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extractive-Based Text Summarization Using Sentence Features

Aim:

This project is to make contribution for automatic summarization of single document based on extractive methods. The backbone of the project is benefiting from pre-defined features of sentences and focusing on them more than word weights.

Prerequisites:

Java v8+
MySQL Server v5.2+

Installation:

Java files and required external jar files are added to "src" folder. Database tables and routines can be found under "sql" folder.

Dataset:

During the experiments a dataset that contains news articles of BBC and their pair summaries are used to make a comparison with the generated summaries. The context of the selected articles are about business, entertainment, politics, sport and technology.
https://www.kaggle.com/pariza/bbc-news-summary

Running The Tests:

The interface of the project provides loading selected texts and reference summaries.The generated summaries and their evaluation results are presented as output of the project. Also, the texts, their reference summaries and generated summaries are included in "experiment" folder.

Evaluation of the results are provided using ROUGE metrics and ROUGE 2.0 toolkit, which is a Java package for evaluation of the task, is externally adapted to the project.
http://rxnlp.com/rouge-2-0-usage-documentation/#.XvdylCgzY2x

Contribution:

  • No need for semantic consideration of words, titles and calculating similarity of sentences
  • Simplified calculations
  • Evaluated some stop-words as bonus-words
  • Used advantages of some punctuations
  • Good at extracting short but valuable sentences

About

ir-project-ir-term-project-pelin-kocyigit created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published