Skip to content
StephanOepen edited this page Jun 26, 2015 · 7 revisions

Background

Internal (for the time being at least) notes on working with the Norwegian Dependency Treebank at LTG.

Initial Set-Up

  svn co http://svn.emmtee.net/ltg/ndt

The source code of the Regular Expression–Based Pre-Processor (REPP) and the toolkit for Robust Evaluation of Syntactic Analysis (RESA) is included as an external SVN dependency. As a first-time, preparatory step, both tools needs to be compiled. In tokenization/src/repp/ and tokenization/src/resa/, run:

  autoreconf -i
  ./configure
  make

Segmentation and Tokenization

  cat ../data/txt/nob/ap001.txt \
  | ./src/sentence-split_no.perl \
  | while read line; do \
      echo "$line" | ./src/repp/src/repp -c repp/nob.set --format line; \
    done \
  > ap001.t

  ./src/resa/src/resa \
    -r ../data/txt/nob/ap001.txt \
    -g ../data/conll/nob/ap001.conll -G CONLLX \
    -t ap001.t -T TAB
Clone this wiki locally