Skip to content

RmrsSemi

FrancisBond edited this page Jun 28, 2011 · 8 revisions

This page discusses the SEM-I.

The SEM-I is a theoretically grounded component of each grammar, capturing several classes of lexical regularities while also serving the crucial engineering function of supplying a reliable and complete specification of the elementary predications the grammar can realize.

For more information see:

[http://web.mysites.ntu.edu.sg/fcbond/open/pubs/2005-summit-semi.pdf SEM-I rational MT: Enriching deep grammars with a semantic interface for scalable machine translation]

This page was constructed by FrancisBond, based on information from DanFlickinger and StephanOepen.

The SEM-I in MT

In the transfer process, MRSs are typically checked against the target language SEM-I. This allows you to filter out ill-formed transfers. In batch processing it may save a lot of time to suppress incompatible transfer outputs, as the assumption is they can only fail in generation anyhow. In theory, the SEM-I test can even increase end-to-end coverage, as filtering out incompatible transfer results may allow better MRSs to squeeze into the top-n range that is passed on downstream.

Those that aren't considered compliant will be colored pink in the multi-MRS browser, and should output `invalid output predicate' messages somewhere in the fan-out log (or maybe the :error field of the profile)

The SEM-I test will, by default, reject EPs that lack roles which are not marked as optional in the SEM-I (e.g. the ARG0 of a `compound' relation, which rarely plays a role semantically). It is possible to limit SEM-I comparison to checking the validity of predicates and variable properties by setting the following variable (in the MT package):

 (setf *semi-test* '(:predicates :properties))

This increases coverage considerably for JaEn, and is also the default setting for NoEn.

How to dump a SEM-I

(in-package "MT")
(setf semi (mt:construct-semi))
(with-open-file (stream "~/tmp/core.smi"
                :direction :output :if-exists :supersede)
 (mt::print-semi semi :format :compact :stream stream))

Note that this will create the bulk of the SEM-I, but you'll still need a manually created file for your grammar analogous to the file 'erg.smi' in the ERG, which at its end includes the dumped file 'core.smi'.

You can control what information about variables gets dumps with the abstract.vpm.

Here is an example from the ERG, it deletes unmarked values from the output.

;;;
;;; when creating the SEM-I, ditch a lot of the variable property information,
;;; essentially only keeping what is relevant in terms of the interface.
;;;

GEND : GEND
  m      >> m
  f      >> f
  n      >> n
  m-or-f >> m-or-f
  *      >> !


NUM : NUM
  sg >> sg
  pl >> pl
  *  >> !


IND : IND
  + >> +
  - >> -
  * >> !


TENSE : TENSE
  past   >> past
  pre    >> pres
  fut    >> fut
  *      >> !


MOOD : MOOD 
  subjunctive >> subjunctive
  *           >> !


PROG : PROG
  + >> +
  * >> !


PERF : PERF
  + >> +
  * >> !
Clone this wiki locally