Given a product containing
where
The SOCP Quadruple Extraction task aims to extract quadruple comprising subject, object, category, and preference. We have built the SOCP-Phone dataset for the SOCP task. This dataset is collected from the JD platform, consisting of mobile phone product reviews posted between November 1, 2021, and January 15, 2024. The statistical information for SOCP-Phone is shown in the table below:
|
Train |
Dev |
Test |
Total |
|
#Categories |
61 |
38 |
38 |
61 |
|
#Sentences |
#Comparative |
1680 |
210 |
210 |
2100 |
#Non-Comparative |
1680 |
210 |
210 |
2100 |
|
#Multi-Comparative |
559 |
79 |
69 |
707 |
|
Total |
3360 |
420 |
420 |
4200 |
|
#Elements |
Subject |
1680 |
210 |
210 |
2100 |
Object |
1857 |
230 |
225 |
2309 |
|
Category |
2475 |
323 |
309 |
3108 |
|
Preference |
1876 |
241 |
235 |
2352 |
|
#Quadruples |
BETTER |
1963 |
265 |
257 |
2485 |
WORSE |
400 |
24 |
37 |
482 |
|
EQUAL |
200 |
44 |
24 |
248 |
|
Total |
2563 |
333 |
318 |
3215 |
|
#Quadruples/#Comparative |
1.53 |
1.59 |
1.51 |
1.53 |
\#Categories
represents the number of aspect categories. \#Sentences
indicates the total number of sentences annotated in the dataset, where \#Comparative
, \#Non-comparative
and \#Multi-comparative
refer to the number of comparative sentences, non-comparative sentences and comparative sentences with multiple comparisons, respectively. \#Elements
denotes the total number of annotations for comparative elements (Subject, Object, Category, and Preference). \#Quadruples
denotes the number of comparative quadruples constructed by combining comparative elements, and is statistically counted based on their comparative preference (Better, Worse, or Equal). \#Quadruples/\#Comparative
indicates the average number of quadruples per comparative sentence.
Here is a sample data instance:
{
"_id": "0a171a04782299f20620dfe744aaabb9",
"creationTime": "2023-10-19 17:16:21",
"phoneName": "Apple iPhone 15",
"phoneBrand": "Apple",
"comment": "美滋滋,超便宜买到的才发布没多久的苹果15,体验了灵动岛,手感超级棒,比我的13流畅一些,可能是内存多的缘故,颜色很漂亮,很喜欢,感谢百亿补贴",
"quadList": [
{
"subject": "Apple iPhone 15",
"object": "AppleiPhone 13",
"preference": "更好",
"gold_category": "OS#PERFORMANCE"
}
]
}
The 50 samples of Phone-SOCP are provided in "data/sample_50.json"
. The full dataset will be released after acceptance.
The annotation details are described in Annotation.md
.
We first introduce the construction of the category system, followed by the annotation schema with illustrative examples. Finally, we present the automatic annotation approach using large language models, along with the prompt design strategy.