Skip to content

ItsdbProfiling

FrancisBond edited this page Jun 26, 2006 · 13 revisions

TableOfContents

Generation

You can generate from a profile with stored MRSes (by, e.g. thinning normalize with (setf tsdb::*tsdb-semantix-hook* "mrs::get-mrs-string").

First select the profile with MRSes as the Gold Profile (middle click). Then create a new profile with the same skeleton, and select it (left click). Set Process---Switches---generate. Then do Process--All Items.

You can check whether the generated output includes the input parse as follows. Select the profile with MRSes as the Gold Profile (middle click), and then select the generated profile. Change the Compare --- Switches to Subset Comparison and Best Parse Only. Change Compare   --- Intersection to derivation, and select Compare --- Detail.

Preprocessing

You can pass the items to be parsed through a preprocessor by defining it in the cpu. E.g.

  (make-cpu
    :host (short-site-name)
    :spawn "/path/to/cheap"
    :options (list "-tsdb"  "-tok=yy" "-packing=7" "-default-les"
                   (format nil "~a/grammars/japanese/japanese.grm" %delphin%))
    :preprocessor "lkb::chasen-preprocess-for-pet"
    :class :chasen :grammar "jacy-chasen" :name "jacy-chasen" :threshold 2)
   (make-cpu
    :host (short-site-name)
    :spawn "/path/to/cheap"
    :options (list "-tsdb"  "-tok=yy" "-packing=7" "-default-les"
                   (format nil "~a/grammars/japanese/japanese.grm" %delphin%))
    :preprocessor "tsdb::rasp-preprocess-for-pet"
    :class :rasp :grammar "jacy-rasp" :name "jacy-rasp" :threshold 2)

chasen-preprocess-for-pet and rasp-preprocess-for-pet are lisp functions that take two arguments, the item itself and an optional tagger, and return a tokenized string suitable for pet: in this case the yy-tokenization.

chasen-preprocess-for-pet calls an external morpholigical analyzer (ChaSen) and reformats the output.

rasp-preprocess-for-pet assumes the input is of the form word_pos word_pos and associates each word with its POS in the input chart.

Clone this wiki locally