Skip to content

Commit 71ae995

Browse files
authored
Merge pull request #18 from tagucci/refactoring
Refactoring codes
2 parents 653f717 + fd6a31d commit 71ae995

File tree

12 files changed

+404
-307
lines changed

12 files changed

+404
-307
lines changed

README.md

Lines changed: 32 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# pythonrouge
2-
This is the python script to use ROUGE, summarization evaluation toolkit.
2+
This is the python wrapper to use ROUGE, summarization evaluation toolkit.
33

4-
In this implementation, you can evaluate various types of ROUGE metrics. You can evaluate your system summaries with reference summaries right now. It's not necessary to make an xml file as in the general ROUGE package. However, you can evaluate ROUGE scores in a standard way if you saved system summaries and reference summaries in specific directories. In document summarization research, recall or F-measure of ROUGE metrics is used in most cases. So you can choose only recall or F-measure of ROUGE evaluation result for convenience.
4+
In this implementation, you can evaluate various types of ROUGE metrics. You can evaluate your system summaries with reference summaries right now. It's not necessary to make an xml file as in the general ROUGE package. However, you can evaluate ROUGE scores in a standard way if you saved system summaries and reference summaries in specific directories. In the document summarization research, recall or F-measure of ROUGE metrics is used in most cases. So you can choose either recall or F-measure or both of these of ROUGE evaluation result for convenience.
55

66
Any feedbacks or comments are welcome.
77

@@ -10,12 +10,13 @@ You can install pythonrouge in both ways
1010

1111
```
1212
# not using pip
13+
git clone https://github.com/tagucci/pythonrouge.git
1314
python setup.py install
1415
1516
# using pip
1617
pip install git+https://github.com/tagucci/pythonrouge.git
1718
```
18-
Then, you can use pythonrouge. If you don't have ROUGE package, I recommend you clone this repository to your local, and do "python setup.py install".
19+
Then, you can use pythonrouge.
1920

2021
# Usage
2122

@@ -24,28 +25,28 @@ The only things you need to evaluate ROUGE score is to specify the paths of ROUG
2425
```
2526
from pythonrouge.pythonrouge import Pythonrouge
2627
27-
ROUGE_path = sys.argv[1] #ROUGE-1.5.5.pl
28-
data_path = sys.argv[2] #data folder in RELEASE-1.5.5
29-
30-
# initialize setting of ROUGE, eval ROUGE-1, 2, SU4, L
31-
rouge = Pythonrouge(n_gram=2, ROUGE_SU4=True, ROUGE_L=True, stemming=True, stopwords=True, word_level=True, length_limit=True, length=50, use_cf=False, cf=95, scoring_formula="average", resampling=True, samples=1000, favor=True, p=0.5)
32-
3328
# system summary & reference summary
3429
summary = [[" Tokyo is the one of the biggest city in the world."]]
3530
reference = [[["The capital of Japan, Tokyo, is the center of Japanese economy."]]]
3631
37-
# If you evaluate ROUGE by sentence list as above, set files=False
38-
setting_file = rouge.setting(files=False, summary=summary, reference=reference)
39-
40-
# If you need only recall of ROUGE metrics, set recall_only=True
41-
result = rouge.eval_rouge(setting_file, recall_only=True, ROUGE_path=ROUGE_path, data_path=data_path)
42-
print(result)
32+
# initialize setting of ROUGE to eval ROUGE-1, 2, SU4
33+
# if you evaluate ROUGE by sentence list as above, set summary_file_exist=False
34+
# if recall_only=True, you can get recall scores of ROUGE
35+
rouge = Pythonrouge(summary_file_exist=False,
36+
summary=summary, reference=reference,
37+
n_gram=2, ROUGE_SU4=True, ROUGE_L=False,
38+
recall_only=True, stemming=True, stopwords=True,
39+
word_level=True, length_limit=True, length=50,
40+
use_cf=False, cf=95, scoring_formula='average',
41+
resampling=True, samples=1000, favor=True, p=0.5)
42+
score = rouge.calc_score()
43+
print(score)
4344
```
4445

4546
The output will be below. In this case, only recall metrics of ROUGE is printed.
4647

4748
```
48-
{'ROUGE-1': 0.16667, 'ROUGE-2': 0.0, 'ROUGE-L': 0.16667, 'ROUGE-SU4': 0.05}
49+
{'ROUGE-1': 0.16667, 'ROUGE-2': 0.0, 'ROUGE-SU4': 0.05}
4950
```
5051

5152
You can also evaluate ROUGE scripts in a standard way.
@@ -72,23 +73,19 @@ After putting system/reference files as above, you can evaluate ROUGE metrics as
7273
```
7374
from pythonrouge.pythonrouge import Pythonrouge
7475
75-
ROUGE_path = sys.argv[1] #ROUGE-1.5.5.pl
76-
data_path = sys.argv[2] #data folder in RELEASE-1.5.5
77-
78-
# initialize setting of ROUGE, eval ROUGE-1~4, SU4
79-
rouge = Pythonrouge(n_gram=4, ROUGE_SU4=True, ROUGE_L=True, stemming=True, stopwords=True, word_level=True, length_limit=True, length=50, use_cf=False, cf=95, scoring_formula="average", resampling=True, samples=1000, favor=True, p=0.5)
80-
81-
# make a setting file, set files=True because you've already save files in specific directories
82-
setting_file = rouge.setting(files=True, summary_path=summary_dir, reference_path=reference_dir)
83-
84-
# If you need only F-measure of ROUGE metrics, set f_measure_only=True
85-
result = rouge.eval_rouge(setting_file, ROUGE_path=ROUGE_path, data_path=data_path)
86-
print(result)
87-
> {ROUGE-1': 0.29836, 'ROUGE-2': 0.07059, 'ROUGE-3': 0.03896, ', 'ROUGE-4': 0.02899, 'ROUGE-SU4': 0.12444}
76+
# initialize setting of ROUGE, eval ROUGE-1, 2, SU4
77+
# if summary_file_exis=True, you should specify system summary(peer_path) and reference summary(model_path) paths
78+
rouge = Pythonrouge(summary_file_exist=True,
79+
peer_path=summary, model_path=reference,
80+
n_gram=2, ROUGE_SU4=True, ROUGE_L=False,
81+
recall_only=True,
82+
stemming=True, stopwords=True,
83+
word_level=True, length_limit=True, length=50,
84+
use_cf=False, cf=95, scoring_formula='average',
85+
resampling=True, samples=1000, favor=True, p=0.5)
8886
```
8987

9088

91-
9289
# Error Handling
9390
If you encounter following error message when you use pythonrouge
9491

@@ -103,3 +100,8 @@ cd pythonrouge/RELEASE-1.5.5/data/
103100
rm WordNet-2.0.exc.db
104101
./WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db
105102
```
103+
104+
# TODO
105+
106+
- [ ] enable to non-alphabetic languages such as japanese, chinese
107+
- [ ] add automated testing

example.py

Lines changed: 64 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,73 @@
11
# -*- coding: utf-8 -*-
22
from __future__ import print_function
3-
import sys
43
from pythonrouge.pythonrouge import Pythonrouge
5-
4+
from pprint import pprint
65

76
if __name__ == '__main__':
8-
ROUGE_path = "./pythonrouge/RELEASE-1.5.5/ROUGE-1.5.5.pl"
9-
data_path = "./pythonrouge/RELEASE-1.5.5/data"
10-
summary_dir = "./sample/summary/"
11-
reference_dir = "./sample/reference/"
12-
# setting rouge options
13-
rouge = Pythonrouge(n_gram=2, ROUGE_SU4=True, ROUGE_L=True, stemming=True, stopwords=True, word_level=True, length_limit=True, length=50, use_cf=False, cf=95, scoring_formula="average", resampling=True, samples=1000, favor=True, p=0.5)
14-
print("evaluate sumamry & reference in these dir\nsummary: {}\nreference: {}".format(summary_dir, reference_dir))
15-
print("\nAll metric")
16-
setting_file = rouge.setting(files=True, summary_path=summary_dir, reference_path=reference_dir)
17-
print(rouge.eval_rouge(setting_file, ROUGE_path=ROUGE_path, data_path=data_path))
18-
print("\nRecall Only and save setting.xml")
19-
setting_file = rouge.setting(files=True, summary_path=summary_dir, reference_path=reference_dir, delete=False)
20-
print(rouge.eval_rouge(setting_file, recall_only=True, ROUGE_path=ROUGE_path, data_path=data_path))
21-
print("\nEvaluate ROUGE based on sentecnce lists")
22-
summary = [["Great location, very good selection of food for breakfast buffet.",
7+
summary = './sample/summary/'
8+
reference = './sample/reference/'
9+
ROUGE_dir = './pythonrouge/RELEASE-1.5.5/ROUGE-1.5.5.pl'
10+
data_dir = './pythonrouge/RELEASE-1.5.5/data/'
11+
print('evaluate sumamry & reference in these dirs')
12+
print('summary:\t{}\nreference:\t{}'.format(summary, reference))
13+
rouge = Pythonrouge(summary_file_exist=True,
14+
peer_path=summary, model_path=reference,
15+
n_gram=2, ROUGE_SU4=True, ROUGE_L=False,
16+
recall_only=True,
17+
stemming=True, stopwords=True,
18+
word_level=True, length_limit=True, length=50,
19+
use_cf=False, cf=95, scoring_formula='average',
20+
resampling=True, samples=1000, favor=True, p=0.5)
21+
score = rouge.calc_score()
22+
print('ROUGE-N(1-2) & SU4 F-measure only')
23+
pprint(score)
24+
print('Evaluate ROUGE based on sentecnce lists')
25+
"""
26+
ROUGE evaluates all system summaries and its corresponding reference
27+
a summary or summaries at onece.
28+
Summary should be double list, in each list has each summary.
29+
Reference summaries should be triple list because some of reference
30+
has multiple gold summaries.
31+
"""
32+
summary = [["Great location, very good selection of food for\
33+
breakfast buffet.",
2334
"Stunning food, amazing service.",
2435
"The food is excellent and the service great."],
25-
["The keyboard, more than 90% standard size, is just large enough .",
36+
["The keyboard, more than 90% standard size, is just\
37+
large enough .",
2638
"Surprisingly readable screen for the size .",
2739
"Smaller size videos play even smoother ."]]
28-
reference = [[["Food was excellent with a wide range of choices and good services.", "It was a bit expensive though."],
29-
["Food can be a little bit overpriced, but is good for a hotel."],
30-
["The food in the hotel was a little over priced but excellent in taste and choice.",
31-
"There were also many choices to eat in the near vicinity of the hotel."],
32-
["The food is good, the service great.",
33-
"Very good selection of food for breakfast buffet."]
34-
],
35-
[
36-
["The size is great and allows for excellent portability.",
37-
"Makes it exceptionally easy to tote around, and the keyboard is fairly big considering the size of this netbook."],
38-
["Size is small and manageable.",
39-
"Perfect size and weight.",
40-
"Great size for travel."],
41-
["The keyboard is a decent size, a bit smaller then average but good.",
42-
"The laptop itself is small but big enough do do things on it."],
43-
["In spite of being small it is still comfortable.",
44-
"The screen and keyboard are well sized for use"]
45-
]
46-
]
47-
doc_id = 1
48-
for s, r in zip(summary, reference):
49-
print("sytem summary_{}: {}".format(doc_id, " ".join(s)))
50-
for i, doc in enumerate(r):
51-
print("reference summary_{}_{}: {}".format(doc_id, i+1, " ".join(doc)))
52-
doc_id += 1
53-
setting_file2 = rouge.setting(files=False, summary=summary, reference=reference)
54-
print("\nF-measure Only")
55-
print(rouge.eval_rouge(setting_file2, f_measure_only=True, ROUGE_path=ROUGE_path, data_path=data_path))
40+
reference = [
41+
[["Food was excellent with a wide range of choices and\
42+
good services.", "It was a bit expensive though."],
43+
["Food can be a little bit overpriced, but is good for\
44+
hotel."],
45+
["The food in the hotel was a little over priced but\
46+
excellent in taste and choice.",
47+
"There were also many choices to eat in the near\
48+
vicinity of the hotel."]],
49+
[["The size is great and allows for excellent\
50+
portability.",
51+
"Makes it exceptionally easy to tote around, and the\
52+
keyboard is fairly big considering the size of this\
53+
netbook."],
54+
["Size is small and manageable.",
55+
"Perfect size and weight.",
56+
"Great size for travel."],
57+
["The keyboard is a decent size, a bit smaller then\
58+
average but good.",
59+
"The laptop itself is small but big enough do do\
60+
things on it."],
61+
["In spite of being small it is still comfortable.",
62+
"The screen and keyboard are well sized for use"]]
63+
]
64+
rouge = Pythonrouge(summary_file_exist=False,
65+
summary=summary, reference=reference,
66+
n_gram=2, ROUGE_SU4=True, ROUGE_L=False,
67+
recall_only=True, stemming=True, stopwords=True,
68+
word_level=True, length_limit=True, length=50,
69+
use_cf=False, cf=95, scoring_formula='average',
70+
resampling=True, samples=1000, favor=True, p=0.5)
71+
score = rouge.calc_score()
72+
print('ROUGE-N(1-2) & SU4 recall only')
73+
pprint(score)

0 commit comments

Comments
 (0)