-
Notifications
You must be signed in to change notification settings - Fork 4
ErgSemantics
The ERG Semantic Documentation (ESD) initiative is an ongoing effort to provide ‘end-user’ documentation on the meaning representations that provide the interface to parsing and generation using the English Resource Grammar (ERG). The ERG has been under continuous development for more than two decades and provides fine-grained syntactico-semantic analyses at the utterance (i.e. ‘sentence’) level. While ERG meaning representations abstract to a large degree from semantically irrelevant surface variation, it can at times be challenging to interpret (and appreciate) the nuances of particular semantic analyses. The EDS pages seek to provide an ever-growing ‘encyclopedia’ of semantic analyses available from the ERG.
Additional background information is provided by Flickinger, et al. (2014). These pages are jointly maintained by Emily M. Bender, Dan Flickinger, and Stephan Oepen, with inputs and feedback from, among others, Francis Bond, Ann Copestake, and Alex Lascarides.
The ESD pages are organized as a hyper-linked collection of smaller documents, each typically discussing a specific semantic phenomenon or particular set of higher-level considerations. ERG meaning representations take the form of underspecified logical forms, adopting the framework of Minimal Recursion Semantics (MRS; Copestake, et al. (2005) An informal introduction to MRS, its use by the ERG, and basic terminology is provided by the [ErgSemantics/Basics] page. Beyond these more technical foundations, the [ErgSemantics/Design] pages discuss broader linguistic design decisions, for example assumptions regarding quantification, or the notion of eventualities assumed. For first-time consumers of ERG meaning representations, these pages aim to establish the ‘scaffolding’ for the core of the ESD pages, viz. a collection of pages that present individual semantic phenomena. The ‘table of content’ for this collection is available through the [ErgSemantics/Inventory] page.
Part of our goals in documenting the ERG semantics is to make explicit remaining differences in our degrees of commitment to (or confidence in) individual analyses. In many cases, current semantic analyses reflect a careful design process (possibly building on supporting background literature or revisions of earlier attempts); in other cases, there may be known minor deficiencies; and for yet another, hopefully minor group of phenomena, current analyses may be mere placeholders (‘tying things together’ somehow, without a deep commitment to the specifics of the analysis).
We developed a discovery procedure which starts from grammar entities (phrase structure rules, lexical rules, and lexical types) in the current version of the ERG to enable a data-driven exploration of semantic phenomena which have received treatments in the ERG to date. The discovery procedure starts by identifying grammar entities which are likely to contribute to the composition of semantic representations that go beyond the basics. The details of what was considered ‘beyond the basics’ in the discovery of semantic phenomena are summarized on the ErgSemantics/Discovery page, together with some reflections on the effectiveness of the current procedure.
We organize this documentation in terms of what we consider semantic phenomena; the emerging inventory of phenomena is available as the ErgSemantics/Inventory, ordered lexicographically.
One aspect of the documentation produced in this work is a test suite illustrating each identified phenomenon with one or more short, simple sentences, attempting to balance restricted vocabulary size with the clarity of the intended reading of each example. This test suite can be viewed as an extension of the MRS Test Suite.
In capturing semantic phenomena (and hopefully also in future work on automated regression testing) we invoke a notion of semantic fingerprints, i.e. characteristics of the MRS configuration that identify the phenomenon. We utilize a compact template language for MRS fingerprints (similar in form to the MRS LaTeX style; called ERS fingerprints when specialized for the semantic analyses of the English Resource Grammar) that makes the specification of labels and (characterization) links optional, and further allows wild-carding of predicate symbols and role labels (using ‘_’, i.e. just an underscore). For plain N–N compounding, as in garden dog, for example, we take the semantic fingerprint to look something like the following:
h0:compound[ARG1 x1, ARG2 x2]
h0:[ARG0 x1]
[ARG0 x2]
In other words, the phenomenon is characterized by the appearance of the two-place compound relation, linking together another two EPs in the configuration indicated by the shared label h0 (of the compound head and the two-place modifier relation) and the shared referential indices x1 and x2. We do not include the covert quantifier required when the modifier is a non-quantified nominal, or the =q handle constraint holding between the udef_q and the EP introducing x2 (corresponding to garden in our example), because this part of the semantic analysis of the compound construction follows from the analyses of separate phenomena (though ones that are typically co-present with this type of compounding), i.e. general ERG assumptions about the representations of common nouns and quantifiers.
There is search interface for ‘fingerprinting’ collections of ERG analyses, i.e. use the fingerprints (or variants) to search for instances of semantic phenomena, in either the CCS Test Suite or the DeepBank Treebank.
Home | Forum | Discussions | Events