Skip to content

JacyTop

UlrichSchaefer edited this page Sep 12, 2007 · 40 revisions

Overview

The Jacy grammar is a broad-coverage linguistically precise grammar of Japanese. It is based on the HPSG formalism with MRS semantics. LKB is the primary grammar development environment, but the grammar processing can be efficiently done with PET.

The first application of the Japanese HPSG was the Verbmobil system, a spoken language machine translation project, where the Japanese HPSG was used in deep processing of appointment scheduling and travel reservation dialogues. The grammar was also used in an industrial application of automatic email response. Recently, the grammar contributes to the EU project DeepThought, where the main focus is on building applications for combined shallow and deep natural language processing. This project is multilingually oriented, such that much effort is put on multilingual approaches to grammatical phenomena and building a matrix grammar that can be used as the basis for the development of further grammars.

Current development is mainly being done by FrancisBond at [http://www2.nict.go.jp/x/x161/en/index.html NiCT], with help from Takayuki Kuribayashi and Chikara Hashimoto. We plan to commit a new major release sometime in 2007-04.

Melanie Siegel is the original principal JACY developer. Major contributions came from EmilyBender (University of Washington), especially concerning the MRS construction and numeral expressions. StephanOepen (Universitetet i Oslo & CSLI Stanford) contributed support on the grammar development environment, Japanese font encodings and inclusion of [http://chasen.naist.jp ChaSen]. Ulrich Callmeier (acrolinx GmbH) contributed the requirements for letting the grammar run on his fast and efficient PET system. Akira Ohtani, ChikaraHashimoto (Kyoto University), FrancisBond, Sanae Fujita, Shigeko Nariyama and Takaaki Tanaka (NTT Communication Science Laboratories - Machine Translation Research Group) contributed grammar extensions, especially for verbal compounds and relative sentence constructions, and many lexicon entries. UlrichSchaefer integrated [http://chasen.naist.jp ChaSen], Japanese Named Entity Recognition via [http://sprout.dfki.de SProUT] and [http://wiki.delph-in.net/moin/PetTop PET] with the Jacy grammar into the [http://heartofgold.dfki.de Heart of Gold] middleware for robust parsing of Japanese text, adding automatic translations of Chasen's EUC-JP byte offsets to Unicode character counts.

A presentation explaining grammar fundamentals can be downloaded (http://www.delph-in.net/jacy/jacy.pdf). There is some on-line documentation available at JacyDoc.

Download and Licensing

The grammar sources are available at [http://jacy.opendfki.de jacy.opendfki.de].

Use

svn checkout https://jacy.opendfki.de/repos/trunk jacy

This checks out the current stable version (trunk) to the local directory jacy.

The following license applies to the JACY grammar:

  • Copyright (c) 1997-2004 Melanie Siegel, Emily Bender, Stephan Oepen Permission is hereby granted, free of charge, to any person obtaining a copy of this software, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
  • You may modify your copy or copies of the grammar or any portion of it, thus forming a work based on the grammar, and copy and distribute such modifications or work under the terms described above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the grammar or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

It would be nice if you'd give me a short feedback about the usage of the grammar. I could also offer to write email, when there is a new version available.

Using the Grammar

Here are some slightly out of date instructions on how to install and run the grammar. Currently the easiest way to install it is through the automated installation (LkbInstallation) with the option --jacy.

Thanks to Francis Bond, Stephan Oepen, Atsuko Shimada, Ulrich Callmeier and Yoshihiro Morimoto for helping to set these up.

This is needed for running the JACY grammar :

  • Basic requirements: Installation of ACL6.0 with CLIM and all patches, Linux installed with Japanese, Open Motif (http://www.openmotif.org)

  • The LKB grammar development system (You will find detailed installation instructions there)

  • The [http://chasen.naist.jp ChaSen] morphological analyzer (You will find detailed installation instructions there)

Although the lkb will run standalone, there are problems with Japanese input. The recommended way to run it is from inside emacs, using the eli interface. Install the lkb and eli (following the instructions in http://www-csli.stanford.edu/~aac/emacslkb.html)

Problems or questions concerning LKB in general can be directed to lkb-bugs@csli.stanford.edu

You need to run Lisp with the EUC locale (ja_JP.EUC-JP) and be sure emacs uses EUC for the process encoding in the *common-lisp* buffer. Use the .emacs.jp file here (http://www.delph-in.net/jacy/.emacs.jp) and adapt the paths. Then, your .emacs must be told that the .emacs.jp exists:

(when (file-exists-p (concat user-home "/.emacs.jp")) (load (concat user-home "/.emacs.jp") nil t t))

You will also need the file .clinit.cl (http://www.delph-in.net/jacy/.clinit.cl). Finally, for running [incr tsdb()] and PET on the Japanese grammar, you will need .tsdbrc (http://www.delph-in.net/jacy/.tsdbrc)

Now load everything, LKB, MRS, plus [incr tsdb()]:

Open emacs

Start Lisp with M-x japanese

:ld ~/src/lkb/src/general/loadup (pushnew :lkb *features*) (pushnew :mrs *features*) (compile-system "tsdb" :force t)

Load the grammar with (read-script-file-aux "~/japanese/lkb/ascript") (your path to the grammar)

You can parse a sentence by typing (do-parse-tty "SENTENCE") in the emacs window.

If you have any questions, please write an email to: siegel.melanie@gmail.com

Using JACY with [incr tsdb()]

Install itsdb, following the instructions in the manual.

The latest version of JACY and versions of itsdb later than 2003-05-20 should work as is with Japanese.

  M-x itsdb

Note: Japanese test sentences should be in euc-jp.

To get itsdb to count Japanese words, you need to segment the test sentences at some stage. This can be done during import.

  • if there is a _global_ preprocessing hook', [incr tsdb()] import will pipe everything through it and use the _second_ value that it returns as the i-length' field; e.g. (setf *tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet")

    will enable that hook globally, and once you use a definition of this function that counts correctly (no good doing length() on a variable _after_ using the destructive nreverse() on it :-{), you will notice that (i) imports from text files are much slower and (ii) Browse -- Test Items' will show [http://chasen.naist.jp ChaSen\] word counts for the i-length' field.

Note that because the import can now take actual time (half a second per item or so), the [incr tsdb()] progress meter should advance correctly during the import from text file function (this does not work on versions older than 2003-06),

There is an example of user-fns.lsp' for JaCY that enables the *tsdb-preprocessing-hook*, when [incr tsdb()] is loaded _before_ the grammar. (You could also set this in ~/.tsdbrc`, but then it would affect everything you do, no matter which grammar was used.)

from user-fns.lsp:

;;; ;;; hook for [incr tsdb()] to call when preprocessing input (going to the PET ;;; parser or when counting `words' while import test items from a text file). ;;;

(defun chasen-preprocess-for-pet (input)

(preprocess-sentence-string input :verbose nil :posp t))

#+(or :pvm :itsdb)

(setf tsdb::*tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet") Using JACY with PET: Install PET following the instructions at

http://www.coli.uni-sb.de/pet/documentation.php3.

You need to segment the Japanese, for example by preprocessing with [http://chasen.naist.jp ChaSen]:

> chasen -F"%m " | cheap ~/japanese/japanese.grm

reading `pet/japanese.set'...

loading `japanese.grm' (Japanese (jan-03))

16674 types in 1.7 s Using JACY with itsdb and PET: Install itsdb and PET.

You can run Japanese with a cpu defined in your .tsdbrc (substituting your pathnames).

After starting lkb-ja and itsdb in emacs:

Choose the cpu in the normal way by evaluating

(tsdb::tsdb :cpu :nihongo :file t) in the *common-lisp* buffer:

LKB(2): (tsdb::tsdb :cpu :nihongo :file t)

The preprocessor calls a function defined in usr-fns.lisp that runs chasen on the input, the combination of "-yy" "-default-les" takes the output and produces default lexical types for unknown words.

References

Siegel, Melanie and Emily M. Bender (2002): Efficient Deep Processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Coling 2002 Post-Conference Workshop. Taipei, Taiwan.

Oepen, Stephan, Emily M. Bender, Uli Callmeier, Dan Flickinger and Melanie Siegel (2002): Parallel Distributed Grammar Engineering for Practical Applications. In Proceedings of the Workshop on Grammar Engineering and Evaluation. Coling 2002 Post-Conference Workshop. Taipei, Taiwan.

Bender, Emily M. (2002): Number Names in Japanese: A Head-Medial Construction in a Head-Final Language. Linguistic Society of America.

Kiefer, B., H.-U. Krieger and M. Siegel (2000): An HPSG-to-CFG Approximation of Japanese. In Proceedings of Coling 2000, Saarbrücken.

Siegel, Melanie (2000): HPSG Analysis of Japanese. In:W.Wahlster(ed.): Verbmobil: Foundations of Speech-to-Speech Translation., Springer Verlag.

Siegel, Melanie (2000): Japanese Honorification in an HPSG Framework. In Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation, ed. A. Ikeya and M. Kawamori, 289-300. Waseda University International Conference Center, Tokyo. Logico-Linguistic Society of Japan.

Siegel, Melanie (1999): The Syntactic Processing of Particles in Japanese Spoken Language. In: Wang, Jhing-Fa and Wu, Chung-Hsien (eds.): Proceedings of the 13th Pacific Asia Conference on Language, Information and Computation, Taipei 1999.

Siegel, Melanie (1998): Japanese Particles in an HPSG Grammar. Verbmobil-Report 220. Universität des Saarlandes.

Clone this wiki locally