-
Notifications
You must be signed in to change notification settings - Fork 4
FeforParCorp
AnetteFrank edited this page Jun 19, 2006
·
31 revisions
* Europarl Corpus
- URL: http://people.csail.mit.edu/koehn/publications/europarl/
- [http://www.dfki.de/~frank/Europarl_sample Samples of Europarl Corpus]
- Languages: da, de, en, el, es, fi, fr, it, nl, pt, sv
- Size per language: 600-700k sents
- Format: currently distributed over approx. 400 files
- Alignment: implicit by basename of file and relative position in raw sentence-separated ascii files
- Todo: complete cross-lingual alignment (currently only pair-wise implicit alignment). Possibly we can get something along these lines from Andreas Eisele.
* OPUS: Technical Documentation (plus Europarl and European Constitution)
- URL: http://logos.uio.no/opus/
Home | Forum | Discussions | Events