Skip to content

ItsdbProfiling

FrancisBond edited this page Jun 26, 2006 · 13 revisions

TableOfContents

Generation

You can generate from a profile with stored MRSes (by, e.g. thinning normalize with (setf tsdb::*tsdb-semantix-hook* "mrs::get-mrs-string").

First select the profile with MRSes as the Gold Profile (middle click). Then create a new profile with the same skeleton, and select it (left click). Set Process---Switches---generate. Then do Process--All Items.

Preprocessing

You can pass the items to be parsed through a preprocessor by defining it in the cpu. E.g.

  (make-cpu
    :host (short-site-name)
    :spawn "/path/to/cheap"
    :options (list "-tsdb"  "-tok=yy" "-packing=7" "-default-les"
                   (format nil "~a/grammars/japanese/japanese.grm" %delphin%))
    :preprocessor "lkb::chasen-preprocess-for-pet"
    :class :chasen :grammar "jacy-chasen" :name "jacy-chasen" :threshold 2)
   (make-cpu
    :host (short-site-name)
    :spawn "/path/to/cheap"
    :options (list "-tsdb"  "-tok=yy" "-packing=7" "-default-les"
                   (format nil "~a/grammars/japanese/japanese.grm" %delphin%))
    :preprocessor "tsdb::rasp-preprocess-for-pet"
    :class :rasp :grammar "jacy-rasp" :name "jacy-rasp" :threshold 2)

chasen-preprocess-for-pet and rasp-preprocess-for-pet are lisp functions that take two arguments, the item itself and an optional tagger, and return a tokenized string suitable for pet: in this case the yy-tokenization.

chasen-preprocess-for-pet calls an external morpholigical analyzer (ChaSen) and reformats the output.

rasp-preprocess-for-pet assumes the input is of the form word_pos word_pos and associates each word with its POS in the input chart.

Clone this wiki locally