Skip to content

FeforParCorp

AnetteFrank edited this page Jun 19, 2006 · 31 revisions

Parallel Corpora for Delph-In

Collections/Samples of available parallel corpora

* Europarl Corpus

- URL: http://people.csail.mit.edu/koehn/publications/europarl/

- [http://www.dfki.de/~frank/Europarl_sample Samples of Europarl Corpus]

- Languages: da, de, en, el, es, fi, fr, it, nl, pt, sv

- Size per language: 600-700k sents

- Format: currently distributed over approx. 400 files

- Alignment: implicit by basename of file and relative position in raw sentence-separated ascii files

- Todo: complete cross-lingual alignment (currently only pair-wise implicit alignment). Possibly we can get something along these lines from Andreas Eisele.

* OPUS: Technical Documentation (plus Europarl and European Constitution)

- URL: http://logos.uio.no/opus/

Clone this wiki locally