Skip to content

LkbLtdb

MichaelGoodman edited this page Nov 4, 2018 · 29 revisions

The Linguistic Type Database processes grammars and treebanks to make online documentation for grammars made with the LKB.

The code can be found on github: https://github.com/fcbond/ltdb

This page is a very rough initial introduction and is somewhat out of date, github may be more up to date.

The Linguistic Type Database (LTDB, née Lextype DB), describes types and rules of the grammar with frequency information from the treebank. Currently, we have applied the LTDB to grammars and treebanks of Chinese, English, Japanese and Spanish.

The minimal Linguistic Type Database offers the following:

  • a web interface to types and rules in a DELPH-IN grammar, including examples from the lexicon.

  • in-line documentation from the tdl file (if given) :

    • human readable name
    • description (in Restructured text)
    • example sentences
  • links to treebanks

    • words, rules and lexical types in context

    • grammar rules (including number of children (arity) and position of head (head))

    • adds the information from the Grammar Catalogue Metadata

  • faster cleaner python

  • dependency on PyDelphin

  • approximate match for type lookup

The LTDB has been updated by Francis Bond and Luis Morgado da Costa, using PyDelphin and visualization from delphin-viz. The software was originally written by ChikaraHashimoto and FrancisBond in perl, and used the html output provided by StephanOepen.

Earlier versions of the lexical type database also included links to external references and other lexicons. We hope to revive them at some stage.

Sample In Line Documentation

; <type val="n_-_c_le">
; <description>Intransitive count noun (icn)    
; <ex>The dog barked.
; <nex>
; <todo>
; </type>
n_-_c_le := n_intr_lex_entry.

; <type val="case-p-lex-np-kara">
; <name-ja>承名詞受身主格助詞
; <description>名詞の直後について、受身文の主格(実際にその行為を行うもの)を表す助詞「から」。
; <ex>子供 が 親 から たしなめ られる
; <nex>友人 から 自転車 を 買う
; <todo>(07-03-30)間接受身でも使えるようにすべき。(lkb::do-parse-tty "親戚 から 怒ら れる")
; (07-03-30)「〜」はこのtypeでよいのか?(格として取ることがないため)
; (07-03-30)postp-lexの後につくtypeも必要。(lkb::do-parse-tty "子供 が 親 とか から たしなめ られる")
; </type>
case-p-lex-np-kara := case-p-lex-np &
                        [SYNSEM.LOCAL.CAT.HEAD.CASE kara-case].

Installation

Currently the Lexical Type Database is distributed with the LKB, in lkb/src/ltdb. There is a README file that describes how to build the database. In summary:

./make-ltdb.bash --grm GRAMMAR

E.g.

./make-ltdb.bash --grm jacy

If you have any gold treebanks

./make-trees.bash --grm GRAMMAR

(slow if you have a lot of trees, needs a fair bit of memory)

Note: if the current grammar version is very different to that used to make the treebanks, many trees will not be exported.

Everything is installed to ~/public_html/GRAMMAR_VERSION

Dependencies

  • LKB (to dump the lexicon and type files)
  • Perl
    • DBD::SQLite
    • XML::DOM
  • SQLite3
  • Apache (for the web server)
  • nsgmls for validation (package sp)

In ubuntu you can satisfy the dependencies by installing LOGON and the following packages:

sudo apt-get install libdbd-sqlite3-perl sp libxml-dom-perl apache2

To enable CGI in user directories, add the following lines to the appropriate Apache configuration file. That could be /etc/apache2/httpd.conf, or more correctly, the appropriate file in /etc/apache2/site-enabled/.

<Directory /home/*/public_html/cgi-bin/>
   Options ExecCGI
   SetHandler cgi-script
</Directory>

References

To Do

  • finish the documentation
    • add screenshots

    • link to some running Lexical Type Databases (like this)

    • include the FieldMappings page?

  • index all rules not just lexical rules, and allow them to be looked up.
  • warn if grammar version and treebank version differ
Clone this wiki locally