-
Notifications
You must be signed in to change notification settings - Fork 4
SemiRfc
A SEM-I, or SEMantic-Interface, is a description of the semantic structures output by the grammar, and may include entries for variables, properties, predicates, and roles. SEM-Is can be useful for validating the semantic output of grammars without having to load the entire grammar.
A related, but separate, component is the Variable Property Mapping (VPM), which maps grammar-internal variable types, properties, and property values into grammar-external ones. A SEM-I describes the valid grammar-external values, and hence the primary VPM for a grammar is conventionally called semi.vpm.
Note for Developers
As of March 2016, the 1.0 version of the SEM-I is available, which introduces support for predicate hierarchies among other changes. Previous iterations of SEM-Is were underexploited and are not described in the primary text of this wiki.
There are four sections in a SEM-I:
Define variable type, their hierarchical relations, and allowed properties. E.g.:
1 u.
2 i < u.
3 e < i : PERF bool, PROGR bool, MOOD bool, TENSE tense, SF sf.
Define allowed property values and value hierarchies. E.g.:
1 bool.
2 + < bool.
3 - < bool.
Define allowed predicate roles and constraints on values. E.g.:
1 ARG0 : i.
2 RSTR : h.
3 CARG : string.
Define the predicate hierarchy and predicate synopses (required and optional roles and constraints on role values). E.g.:
1 _a+little_q < abstract_q : ARG0 i { NUM sg }, RSTR h, BODY h.
2 _accuse_v_of : ARG0 e, ARG1 i, ARG2 p, [ ARG3 i ].
Predicate entries may be divided among several files. One file may contain just the hierarchical relations (e.g. hierarchy.smi in the ERG 1214), another for abstract predicates (e.g. abstract.smi), and another for surface predicates (e.g. surface.smi). Some very top-level, perhaps extragrammatical, entries may appear in the main .smi file as well (e.g. erg.smi).
The .smi files (e.g. erg.smi, hierarchy.smi etc.) use a simplified (non-TDL) syntax to characterize notions of inheritance (e.g. specializations of predicates) and appropriateness (e.g. the frame of arguments and associated value constraints associated with each predicate). Here is a descriptive example:
1 ; comments begin with semicolons
2
3 ; sections begin at column 0 and are followed by a colon
4 variables:
5 ; definitions (by convention) are indented
6 ; entries end in .
7 u.
8 ; inheritance is specified by < with supertypes delimited by &
9 i < u.
10 ; features/properties follow : and are delimited by ,
11 x < i & p : DIV bool, IND bool, GEND gender, PERS person, NUM number, PT pt.
12
13 predicates:
14 ; variable property constraints are bound by { and }, and are delimited by ,
15 _acclimitization_n_1 : ARG0 x { NUM sg, IND - }.
16 ; optional roles are bound by [ and ] (note that commas appear outside of [ and ])
17 _advance_v_1 : ARG0 e, ARG1 i, [ ARG2 i ], [ ARG3 i ].
18
19 ; external files can be included
20 ; sections in included files are merged with sections in the main file
21 include: surface.smi
This BNF describes the general syntax (whitespace is allowed around tokens):
1 SEMI := (Comment | Section | Include)*
2
3 Comment := /;.*/ EOL
4 Section := ("variables" | "properties" | "roles" | "predicates") ":" EOL Contents
5 Include := "include" ":" Filename EOL
6
7 Contents := (Comment | Entry)*
8 Entry := Identifier Parents? Features? "." EOL
9
10 Parents := "<" Identifier ("&" Identifier)*
11
12 Features := ":" (ReqFeats OptFeatures? | OptFeatures)
13 ReqFeatures := Feature ("," Feature)*
14 OptFeatures := "[" Feature "]" ("," "[" Feature "]")*
15 Feature := Identifier Value
16 Value := Identifier Constraints?
17 Constraints := "{" Identifier Identifier ("," Identifier Identifier)* "}"
18
19 Identifier := /[^ ]+/
20 EOL := "\n"
To keep the BNF simple, I didn't specialize the sections, but some paths, such as OptFeatures and Constraints are only valid on entries in the "predicates" section, and the values of features on variables must be properties, whereas on predicates they are variables, etc.
Also, we can assume that entries that don't specify a list of parents inherit from some top symbol (like *top*). And string is another available symbol that can be used without being defined.
The directory of an including file is used as the parent directory of the included file (i.e. the filename is relative). Thus, given the following directory structure:
start.smi
next.smi
subdir/
a1.smi
a2.smi
The start.smi file can include next.smi and a1.smi like this:
1 include: next.smi
2 include: subdir/a1.smi
And then a1.smi can subsequently include a2.smi like this:
1 include: a2.smi
* Predicate hierarchies done
* Linking preds that differ by sense (e.g. number of arguments, like "he ate" vs "he ate a banana"), or mass/count distinctions ("every paper" vs "all the paper"). This is not trying to recreate something like WordNet.
Home | Forum | Discussions | Events