-
Notifications
You must be signed in to change notification settings - Fork 4
JacyInstallation
Here are some instructions on how to install and run the grammar. First you must install the [wiki:LkbTop LKB] or [wiki:PetTop PET]. Note that currently the automated installation (LkbInstallation) with the option --jacy gives an old version of the grammar.
If you want to use [http://chasen.naist.jp ChaSen] for segementation and unknown word handling, then you must also install it.
Although the lkb will run standalone, there are problems with Japanese input. The recommended way to run it is from inside emacs, using the eli interface. Install the lkb and eli (following the instructions in http://www-csli.stanford.edu/~aac/emacslkb.html)
You should run Lisp with a UTF locale (ja_JP.UTF-8).
Now load everything, LKB, MRS, plus [incr tsdb()]:
Open emacs
Start Lisp with M-x japanese
:ld ~/src/lkb/src/general/loadup
(pushnew :lkb *features*)
(pushnew :mrs *features*)
(compile-system "tsdb" :force t)
Load the grammar with:
(read-script-file-aux "~/path/to/grammar/jacy/lkb/script")
or, if you want to use ChaSen:
(read-script-file-aux "~/path/to/grammar/jacy/lkb/script.chasen")
You can parse a sentence by typing (do-parse-tty "犬 が 吠える") in the emacs window.
If you have any questions, please write an email to: bond@ieee.org.
Note: some of these instructions may be out of date.
Install itsdb, following the instructions in the manual.
The latest version of JACY and versions of itsdb later than 2003-05-20 should work as is with Japanese.
M-x itsdb
Note: Japanese test sentences should be in utf-8.
To get itsdb to count Japanese words, you need to segment the test sentences at some stage. This can be done during import.
-
if there is a _global_ preprocessing hook', [incr tsdb()] import will pipe everything through it and use the _second_ value that it returns as the i-length' field; e.g. (setf *tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet")
will enable that hook globally, and once you use a definition of this function that counts correctly (no good doing length() on a variable _after_ using the destructive nreverse() on it :-{), you will notice that (i) imports from text files are much slower and (ii) Browse -- Test Items' will show [http://chasen.naist.jp ChaSen\] word counts for the i-length' field.
Note that because the import can now take actual time (half a second per item or so), the [incr tsdb()] progress meter should advance correctly during the import from text file function (this does not work on versions older than 2003-06),
There is an example of user-fns.lsp' for JaCY that enables the *tsdb-preprocessing-hook*, when [incr tsdb()] is loaded _before_ the grammar. (You could also set this in ~/.tsdbrc`, but then it would affect everything you do, no matter which grammar was used.)
from user-fns.lsp:
;;; ;;; hook for [incr tsdb()] to call when preprocessing input (going to the PET ;;; parser or when counting `words' while import test items from a text file). ;;;
(defun chasen-preprocess-for-pet (input)
(preprocess-sentence-string input :verbose nil :posp t))
#+(or :pvm :itsdb)
(setf tsdb::*tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet")
You need to segment the Japanese, for example by preprocessing with [http://chasen.naist.jp ChaSen]:
> chasen -F"%m " | cheap ~/japanese/japanese.grm
reading `pet/japanese.set'...
loading `japanese.grm' (Japanese (jan-03))
16674 types in 1.7 s
Install itsdb and PET.
You can run Japanese with a cpu defined in your .tsdbrc (substituting your pathnames).
grammardir=${LOGONROOT}/dfki/jacy
grammarscript=${grammardir}/lkb/script
skeletondir=${grammardir}/tsdb/skeletons
grm=${grammardir}/japanese.grm
petpreprocessor=lkb::chasen-preprocess-for-pet
grmname=jacy
(tsdb::make-cpu
:host (short-site-name)
:spawn "${LOGONROOT}/bin/cheap"
:options (list "-tsdb" "-packing" "${grm}")
:class :pet :name "${grmname}-pet" :grammar "${grmname}"
:task '(:parse) :wait 300 :quantum 180)
(tsdb::make-cpu
:host (short-site-name)
:spawn "${LOGONROOT}/bin/cheap"
:options (list "-tsdb" "-packing" "-tok=yy" "-default-les" "${grm}")
:class :lex :name "${grmname}-lex" :grammar "${grmname}"
:preprocessor "${petpreprocessor}"
:task '(:parse) :wait 300 :quantum 180)
After starting lkb-ja and itsdb in emacs:
Choose the cpu in the normal way by evaluating
(tsdb::tsdb :cpu :nihongo :file t) in the *common-lisp* buffer:
LKB(2): (tsdb::tsdb :cpu :nihongo :file t)
The preprocessor calls a function defined in usr-fns.lisp that runs chasen on the input, the combination of "-yy" "-default-les" takes the output and produces default lexical types for unknown words.
Home | Forum | Discussions | Events