Skip to content

Commit 1c9cb8e

Browse files
committed
Create readme.md
1 parent 59dfdbf commit 1c9cb8e

File tree

1 file changed

+190
-0
lines changed

1 file changed

+190
-0
lines changed

readme.md

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
<h1 align="center">
2+
<b>datasets-knowledge-embedding</b>
3+
</h1>
4+
<p align="center">
5+
<!-- License -->
6+
<a href="https://github.com/simonepri/datasets-knowledge-embedding/tree/master/license">
7+
<img src="https://img.shields.io/github/license/simonepri/datasets-knowledge-embedding.svg" alt="Project license" />
8+
</a>
9+
</p>
10+
<p align="center">
11+
📝 A collection of common datasets used in knowledge embedding
12+
</p>
13+
14+
15+
## Datasets
16+
17+
This project collects different datasets used in various knowledge embedding related papers.
18+
It also standardizes the format of these datasets, making it easier to use them in the evaluation of new works.
19+
20+
The datasets can be downloaded from the [release page][release].
21+
For licensing information, please refer to the original dataset license file.
22+
23+
24+
### COUNTRIES-S1
25+
This dataset was introduced in [On Approximate Reasoning Capabilities of Low-Rank Vector Spaces](https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10257).
26+
The link to the original dataset as released by the authors is unknown but a copy has been taken from [here](https://github.com/TimDettmers/ConvE/tree/master/countries).
27+
28+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
29+
|----------|----------------|-------|-------------|------------------|------------|
30+
| 271 | 2 | 1159 | 1111 | 24 | 24 |
31+
32+
[![Download COUNTRIES-S1.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S1.tgz
33+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S1.tgz) [![Download COUNTRIES-S1-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S1-ID.tgz
34+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S1-ID.tgz)
35+
36+
37+
### COUNTRIES-S2
38+
This dataset was introduced in [On Approximate Reasoning Capabilities of Low-Rank Vector Spaces](https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10257).
39+
The link to the original dataset as released by the authors is unknown but a copy has been taken from [here](https://github.com/TimDettmers/ConvE/tree/master/countries).
40+
41+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
42+
|----------|----------------|-------|-------------|------------------|------------|
43+
| 271 | 2 | 1111 | 1063 | 24 | 24 |
44+
45+
[![Download COUNTRIES-S2.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S2.tgz
46+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S2.tgz) [![Download COUNTRIES-S2-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S2-ID.tgz
47+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S2-ID.tgz)
48+
49+
### COUNTRIES-S3
50+
This dataset was introduced in [On Approximate Reasoning Capabilities of Low-Rank Vector Spaces](https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10257).
51+
The link to the original dataset as released by the authors is unknown but a copy has been taken from [here](https://github.com/TimDettmers/ConvE/tree/master/countries).
52+
53+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
54+
|----------|----------------|-------|-------------|------------------|------------|
55+
| 271 | 2 | 1033 | 985 | 24 | 24 |
56+
57+
[![Download COUNTRIES-S3.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S3.tgz
58+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S3.tgz) [![Download COUNTRIES-S3-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/COUNTRIES-S3-ID.tgz
59+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/COUNTRIES-S3-ID.tgz)
60+
61+
### FB15K
62+
This dataset was introduced in [Translating Embeddings for Modeling Multi-relational Data](https://dl.acm.org/doi/10.5555/2999792.2999923).
63+
The original dataset as release by the authors is available [here](https://everest.hds.utc.fr/doku.php?id=en:transe).
64+
65+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
66+
|----------|----------------|-------|-------------|------------------|------------|
67+
| 14951 | 1345 | 592213 | 483142 | 50000 | 59071 |
68+
69+
[![Download FB15K.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K.tgz
70+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K.tgz) [![Download FB15K-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K-ID.tgz
71+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K-ID.tgz)
72+
73+
### FB15K-237
74+
This dataset was introduced in [Observed versus latent features for knowledge base and text inference](https://www.aclweb.org/anthology/W15-4007/).
75+
The original dataset as release by the authors is available [here](https://www.microsoft.com/en-us/download/details.aspx?id=52312).
76+
77+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
78+
|----------|----------------|-------|-------------|------------------|------------|
79+
| 14541 | 237 | 310116 | 272115 | 17535 | 20466 |
80+
81+
[![Download FB15K-237.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K-237.tgz
82+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K-237.tgz) [![Download FB15K-237-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/FB15K-237-ID.tgz
83+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/FB15K-237-ID.tgz)
84+
85+
### KINSHIP
86+
This dataset was introduced in [Learning systems of concepts with an infinite relational model](https://dl.acm.org/doi/10.5555/1597538.1597600).
87+
The original dataset as release by the authors is available [here](http://www.charleskemp.com/code/irm.html).
88+
89+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
90+
|----------|----------------|-------|-------------|------------------|------------|
91+
| 104 | 25 | 10686 | 8544 | 1068 | 1074 |
92+
93+
[![Download KINSHIP.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/KINSHIP.tgz
94+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/KINSHIP.tgz) [![Download KINSHIP-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/KINSHIP-ID.tgz
95+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/KINSHIP-ID.tgz)
96+
97+
### NATIONS
98+
This dataset was introduced in [Learning systems of concepts with an infinite relational model](https://dl.acm.org/doi/10.5555/1597538.1597600).
99+
The original dataset as release by the authors is available [here](http://www.charleskemp.com/code/irm.html).
100+
101+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
102+
|----------|----------------|-------|-------------|------------------|------------|
103+
| 14 | 55 | 1992 | 1592 | 199 | 201 |
104+
105+
[![Download NATIONS.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/NATIONS.tgz
106+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/NATIONS.tgz) [![Download NATIONS-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/NATIONS-ID.tgz
107+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/NATIONS-ID.tgz)
108+
109+
### UMLS
110+
This dataset was introduced in [Learning systems of concepts with an infinite relational model](https://dl.acm.org/doi/10.5555/1597538.1597600).
111+
The original dataset as release by the authors is available [here](http://www.charleskemp.com/code/irm.html).
112+
113+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
114+
|----------|----------------|-------|-------------|------------------|------------|
115+
| 135 | 46 | 6529 | 5216 | 652 | 661 |
116+
117+
[![Download UMLS.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/UMLS.tgz
118+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/UMLS.tgz) [![Download UMLS-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/UMLS-ID.tgz
119+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/UMLS-ID.tgz)
120+
121+
### WN18
122+
This dataset was introduced in [Translating Embeddings for Modeling Multi-relational Data](https://dl.acm.org/doi/10.5555/2999792.2999923).
123+
The original dataset as release by the authors is available [here](https://everest.hds.utc.fr/doku.php?id=en:transe).
124+
125+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
126+
|----------|----------------|-------|-------------|------------------|------------|
127+
| 41105 | 18 | 151442 | 141442 | 5000 | 5000 |
128+
129+
[![Download WN18.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18.tgz
130+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18.tgz) [![Download WN18-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18-ID.tgz
131+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18-ID.tgz)
132+
133+
### WN18RR
134+
This dataset was introduced in [Convolutional 2D Knowledge Graph Embeddings](https://arxiv.org/abs/1707.01476).
135+
The original dataset as release by the authors is available [here](https://github.com/TimDettmers/ConvE).
136+
137+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
138+
|----------|----------------|-------|-------------|------------------|------------|
139+
| 41105 | 11 | 93003 | 86835 | 3034 | 3134 |
140+
141+
[![Download WN18RR.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18RR.tgz
142+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18RR.tgz) [![Download WN18RR-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/WN18RR-ID.tgz
143+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/WN18RR-ID.tgz)
144+
145+
### YAGO3-10
146+
This dataset was introduced in [Convolutional 2D Knowledge Graph Embeddings](https://arxiv.org/abs/1707.01476).
147+
The original dataset as release by the authors is available [here](https://github.com/TimDettmers/ConvE).
148+
149+
| Entities | Relation Types | Edges | Train Edges | Validation Edges | Test Edges |
150+
|----------|----------------|-------|-------------|------------------|------------|
151+
| 123182 | 37 | 1089040 | 1079040 | 5000 | 5000 |
152+
153+
[![Download YAGO3-10.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/YAGO3-10.tgz
154+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/YAGO3-10.tgz) [![Download YAGO3-10-ID.tgz](https://img.shields.io/github/downloads/simonepri/datasets-knowledge-embedding/latest/YAGO3-10-ID.tgz
155+
)](https://github.com/simonepri/datasets-knowledge-embedding/releases/latest/download/YAGO3-10-ID.tgz)
156+
157+
158+
## Add a new dataset
159+
160+
If you want to add a new dataset to this collection, first you need to create three files called `train.tsv`, `valid.tsv`, and `test.tsv` containing respectively the edges for the three splits train, validation and test.
161+
The files must contain tab-separated triples of the form `(head entity, relation, tail entity)`.
162+
163+
Once you did this, you can simply process the three files with the following bash script.
164+
165+
```bash
166+
bash build.sh train.tsv valid.tsv test.tsv .
167+
```
168+
169+
The script uses the [datasets-knowledge-embedding][github:simonepri/datasets-knowledge-embedding] tool under the hood.
170+
171+
172+
## Authors
173+
174+
- **Simone Primarosa** - [simonepri][github:simonepri]
175+
176+
See also the list of [contributors][contributors] who participated in this project.
177+
178+
179+
## License
180+
181+
This project is licensed under the MIT License - see the [license][license] file for details.
182+
183+
<!-- Links -->
184+
[license]: https://github.com/simonepri/datasets-knowledge-embedding/tree/master/license
185+
[contributors]: https://github.com/simonepri/datasets-knowledge-embedding/contributors
186+
[release]: https://github.com/simonepri/datasets-knowledge-embedding/releases/latest
187+
188+
[github:simonepri]: https://github.com/simonepri
189+
190+
[github:simonepri/datasets-knowledge-embedding]: https://github.com/simonepri/datasets-knowledge-embedding

0 commit comments

Comments
 (0)