Skip to content

yuuume/DCRA-ICL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Task

Given a product containing $m$ tokens $P={p_1, p_2, ..., p_m}$ and a corresponding review sentence containing $n$ tokens $R={r_1, r_2, ..., r_n}$, they are combined into a single sentence $S=\text{`` This is a review of \textit{P}: \textit{R} ''}$. The Subject-Object-Category-Preference (SOCP) Quadruple Extraction task aims to first identify whether $S$ is a comparative sentence, and (if so) then extract a set of comparative quadruples in $S$:

$$ \begin{equation} {\cal{S}}_{SOCP} = {..., (sub, obj, cc, cp)_i, ...}, \end{equation} $$

where $sub$ denotes the subject entity, corresponding to $P$; $obj$ represents the object entity being compared with $sub$; $cc \in \cal{C}$ denotes the comparative category, referring to the category of the aspect being compared between $sub$ and $obj$, where $\cal{C}$ is a predefined set of categories; $cp \in {\text{BETTER, WORSE, EQUAL}}$ denotes the comparative preference, indicating whether $sub$ is better than, worse than, or equal to $obj$.

DataSet

The SOCP Quadruple Extraction task aims to extract quadruple comprising subject, object, category, and preference. We have built the SOCP-Phone dataset for the SOCP task. This dataset is collected from the JD platform, consisting of mobile phone product reviews posted between November 1, 2021, and January 15, 2024. The statistical information for SOCP-Phone is shown in the table below:

 

Train

Dev

Test

Total

#Categories

61

38

38

61

#Sentences

#Comparative

1680

210

210

2100

#Non-Comparative

1680

210

210

2100

#Multi-Comparative

559

79

69

707

Total

3360

420

420

4200

#Elements

Subject

1680

210

210

2100

Object

1857

230

225

2309

Category

2475

323

309

3108

Preference

1876

241

235

2352

#Quadruples

BETTER

1963

265

257

2485

WORSE

400

24

37

482

EQUAL

200

44

24

248

Total

2563

333

318

3215

#Quadruples/#Comparative

1.53

1.59

1.51

1.53

\#Categories represents the number of aspect categories. \#Sentences indicates the total number of sentences annotated in the dataset, where \#Comparative, \#Non-comparative and \#Multi-comparative refer to the number of comparative sentences, non-comparative sentences and comparative sentences with multiple comparisons, respectively. \#Elements denotes the total number of annotations for comparative elements (Subject, Object, Category, and Preference). \#Quadruples denotes the number of comparative quadruples constructed by combining comparative elements, and is statistically counted based on their comparative preference (Better, Worse, or Equal). \#Quadruples/\#Comparative indicates the average number of quadruples per comparative sentence.

Here is a sample data instance:

{
    "_id": "0a171a04782299f20620dfe744aaabb9",

    "creationTime": "2023-10-19 17:16:21",

    "phoneName": "Apple iPhone 15",

    "phoneBrand": "Apple",

    "comment": "美滋滋,超便宜买到的才发布没多久的苹果15,体验了灵动岛,手感超级棒,比我的13流畅一些,可能是内存多的缘故,颜色很漂亮,很喜欢,感谢百亿补贴",

    "quadList": [

                    {

                        "subject": "Apple iPhone 15",

                        "object": "AppleiPhone 13",

                        "preference": "更好",

                        "gold_category": "OS#PERFORMANCE"

                    }

                ]

}

The 50 samples of Phone-SOCP are provided in "data/sample_50.json". The full dataset will be released after acceptance.

Annotation

The annotation details are described in Annotation.md. We first introduce the construction of the category system, followed by the annotation schema with illustrative examples. Finally, we present the automatic annotation approach using large language models, along with the prompt design strategy.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages