Skip to content

LkbLtdb

FrancisBond edited this page Nov 15, 2018 · 29 revisions

The Linguistic Type Database processes grammars and treebanks to make online documentation for grammars made with the LKB.

The code can be found on github: https://github.com/fcbond/ltdb

The Linguistic Type Database (LTDB, née Lextype DB), describes types and rules of the grammar with frequency information from the treebank. Currently, we have applied the LTDB to grammars and treebanks of Chinese, English, Japanese and Spanish.

The minimal Linguistic Type Database offers the following:

  • a web interface to types and rules in a DELPH-IN grammar, including examples from the lexicon.

  • in-line documentation from the tdl file (if any):

    • human readable name
    • description (in Restructured text)
    • example sentences
  • links to treebanks

    • words, rules and lexical types in context

    • grammar rules (including number of children (arity) and position of head (head))

    • adds the information from the Grammar Catalogue Metadata

  • faster cleaner python

  • dependency on PyDelphin

  • approximate match for type lookup

The LTDB has been updated by Francis Bond and Luis Morgado da Costa, using PyDelphin and visualization from delphin-viz. The software was originally written by ChikaraHashimoto and FrancisBond in perl, and used the html output provided by StephanOepen.

Earlier versions of the lexical type database also included links to external references and other lexicons. We hope to revive them at some stage.

Sample In Line Documentation

n_-_c_le := n_intr_lex_entry
"""
Intransitive count noun (icn)
<ex>The dog barked.
""".

case-p-lex-np-kara := case-p-lex-np &
"""
<name lang='ja'>承名詞受身主格助詞</name>
名詞の直後について、受身文の主格(実際にその行為を行うもの)を表す助詞「から」。
<ex>子供 が 親 から たしなめ られる
<ex>親戚 から 怒ら れる
<nex>友人 から 自転車 を 買う
(07-03-30)間接受身でも使えるようにすべき。(lkb::do-parse-tty "親戚 から 怒ら れる")
(07-03-30)「〜」はこのtypeでよいのか?(格として取ることがないため)
(07-03-30) postp-lexの後につくtypeも必要。(lkb::do-parse-tty "子供 が 親 とか から たしなめ られる")
"""
                        [SYNSEM.LOCAL.CAT.HEAD.CASE kara-case].

Comments should appear inside the TDL doc-strings. They should be written in reStructuredText. There are two special things recognized.

Examples

<ex>A good example of this type
<nex>A bad example of this type
<mex>A good example of this type, but which is ungrammatical, which we parse through robust or mal-rules or constructions.

Ideally parses of positive examples should contain the type in question, while parses of negative examples should not (although they may be grammatical under other circumstances). It is assumed the the example is finished by a newline. We show both <nex> with an asterisk (∗) and <mex> with a circled asterisk (⊛) in the human readable documentation. Neither is accompanied by an Obelisk.

Names

case-p-lex-np-ga := case-p-lex-np &
"""<name lang='ja'>承名詞主格助詞
名詞の直後について、主格を表す助詞「が」。このtypeによってその名詞は各種用言・助動詞
の主語(ARG1)となることができる。受動態・可能態では、見た目は主格だが、
実際の行為の目的格(ARG2)となる。
<ex>犬 が 走る
<ex>バナナ が 猿 に 食べ られる
<ex>犬 に 芸 が できる か
<nex>彼 は 帰っ た が
"""
                        [SYNSEM.LOCAL.CAT.HEAD.CASE ga].

It is assumed that the name is finished by a newline.

Installation

The code can be found on github: https://github.com/fcbond/ltdb

There is a README file that describes how to build the database. In summary:

./make-ltdb.bash --grmdir /path/to/grammar

E.g.

./make-ltdb.bash --grmdir ~/git/jacy

The code makes certain assumptions:

  • there is an lkb/script file

  • gold trees are under tsdb/gold

    • if multiple items have the same item i-id the first one found will be used
  • currently all tdl files are read

For the dependencies, please see the github page.

To enable CGI in user directories, add the following lines to the appropriate Apache configuration file. That could be /etc/apache2/httpd.conf, or more correctly, the appropriate file in /etc/apache2/site-enabled/.

<Directory /home/*/public_html/cgi-bin/>
   Options ExecCGI
   SetHandler cgi-script
</Directory>

Tool Support

As of 2018-11-4, docstrings are supported by the latest LKB-FOS and PyDelphin, PET and ACE, with support in the LOGON LKB promised soon.

Currently, the LKB does NOT support doc-strings in instances (such as rules, roots and lexical entries) only in types. LTDB and ACE supports these, but recommends you wait for the LKB to support them.

References

To Do

  • finish the documentation
    • add screenshots

    • link to some running Lexical Type Databases (like this)

    • include the FieldMappings page?

Clone this wiki locally