Skip to content

ItsdbProfiling

FrancisBond edited this page Dec 14, 2005 · 13 revisions

Preprocessing

You can pass the items to be parsed through a preprocessor by defining it in the cpu. E.g.

  (make-cpu
    :host (short-site-name)
    :spawn "cheap"
    :options (list "-tsdb"  "-tok=yy" "-packing=7" "-default-les"
                   (format nil "~a/grammars/japanese/japanese.grm" %delphin%))
    :preprocessor "lkb::chasen-preprocess-for-pet"
    :class :chasen :grammar "jacy-chasen" :name "jacy-chasen" :threshold 2)
   (make-cpu
    :host (short-site-name)
    :spawn "cheap"
    :options (list "-tsdb"  "-tok=yy" "-packing=7" "-default-les"
                   (format nil "~a/grammars/japanese/japanese.grm" %delphin%))
    :preprocessor "lkb::rasp-preprocess-for-pet"
    :class :rasp :grammar "jacy-rasp" :name "jacy-rasp" :threshold 2)

chasen-preprocess-for-pet and rasp-preprocess-for-pet are lisp functions that take two arguments, the item itself and an optional tagger, and return a tokenized string suitable for pet: in this case the yy-tokenization.

chasen-preprocess-for-pet calls an external morpholigical analyzer (ChaSen) and reformats the output.

rasp-preprocess-for-pet assumes the input is of the form word_pos word_pos and associates each word with its POS in the input chart.

Clone this wiki locally