Skip to content

ErgReleases

StephanOepen edited this page Mar 30, 2013 · 29 revisions

Background

This page (a work in progress, like so many others on this wiki) collects some practical and historic information around official snapshots of the ERG, e.g. officially released versions of the grammar.

(Pre-)Release MRS Quality Control

Once new treebanked and thinned profiles are available, run a set of automated wellformedness tests on the MRSs, for example:

  $LOGONROOT/redwoods --terg --default \
    --filter syntax,lnk,fragmentation erg/trunk/mrs/12-02-06/pet.1

Re-Generate the Core SEM-I

The bulk of the semantic interface (SEM-I) is auto-generated from the lexicon (recorded as core.smi):

  (in-package :mt)
  (with-open-file (stream "~/src/logon/lingo/terg/etc/core.smi"
                   :direction :output :if-exists :supersede)
    (print-semi (construct-semi) :format :compact :stream stream))

The master file ‘erg.smi’ is manually maintained and includes the auto-generated entries.

Validate and Update the Head Table

To identify new rules (or ones deleted from the grammar), the following will compare the head table on file (by default ‘etc/rules.hds’) to the grammar inventory of constructions:

  (in-package :tsdb)
  (read-heads "~/src/logon/lingo/terg/etc/rules.hds" :test t)

Generate Maximum Entropy and PCFG Models

Parse ranking models included with the grammar are trained on the standard training plus development splits, for example Sections 01 to 12 for WeScience. By default, the Maximum Entropy training scripts (re-)generate a fresh feature cache, hence the following two jobs must not run in parallel

  sbatch ${LOGONROOT}/uio/titan/redwoods \
    --redwoods --run train.wescience.lisp

  sbatch ${LOGONROOT}/uio/titan/redwoods \
    --redwoods --run train.redwoods.lisp

Unlike in the 1111 release, no PCFG model (for chart pruning in PET) is included with the 1212 release.

Update Summary Statistics of Redwoods Treebanks

Since its October 2010 release, the ERG includes a spreadsheet that summarizes key statistics of the gold-standard profiles that comprise the Redwoods Treebank. The raw data for addition to the file ‘etc/redwoods.xls’ can be generated automagically:

  (in-package :tsdb)
  (loop
      with *phenomena* = nil
      with *statistics-aggregate-dimension* = :phenomena
      with *statistics-all-rejections-p* = t
      with *tsdb-home* = (logon-directory "lingo/terg/tsdb/gold" :string)
      initially (purge-profile-cache :all)
      for db in (find-tsdb-directories)
      for name = (get-field :database db)
      do (analyze-trees name :append "/tmp/redwoods.csv" :format :csv))

Populate the Lexical Type Database

Clone this wiki locally