Skip to content

ErgTreebankingGuidelines

EmilyBender edited this page Sep 17, 2013 · 8 revisions

Heuristics for efficient treebanking

Top-down

  • Choose the construction that spans the whole sentence
    • Typically SUBJH
    • Typically not one of the FRAG* rules

Bottom-up

  • Disambiguate lexical entries early, to reduce remaining ambiguity

Technical choices

Complex proper names

Titles

  • |Mr. Browne|
    • Choose NP-TITLE-CMPND, not APPOS

Capitalized words in name

  • treat as parts of name, not ordinary words
  • |Rolls-Royce Motor Cars Inc.|
    • |Motor Cars|
      • NP_NAME_CMPND, not NOUN_N_CMPND
    • |Rolls-Royce|
      • Choose multi-word entry when available
    • |Rolls-Royce Motor Cars|
      • NP_NAME_CMPND
    • Attach |Inc.| with NADJ_RR

Profession modifier

  • treat as appositive
  • |Howard Mosher, president and CEO|
    • First combine |Howard Mosher|
    • Then combine it with |president and CEO| using APPOS_NBAR

Native names preferred when available

  • Company names
    • |Rolls-Royce|
      • Choose n_-_pn_le, not NP_NAME_CMPND
  • Country names
    • |U.S.|
      • Choose n_-_c-nm-pd_le, not n_-_pn-gen_le

Proper names and punctuation

  • Unknown names
    • |Elianti.|
      • Choose PUNCT_PERIOD_ORULE (period is not part of name)
  • Name abbreviations containing periods
    • |U.S.|
      • Choose PUNCT_PERIOD_ORULE if word is at end of sentence

PP attachment

  • Choose highest attachment point consistent with meaning
    • |remain steady at 1,200 cars|
      • attach to VP, not to |steady|
    • |reserve a room for Browne|
      • attach to VP, not to |room|
  • In copula constructions (with forms of verb "be"), attach PP inside
    • |be payable Feb. 15|
      • First combine |payable| with |Feb. 15| with HADJ_I_UNS
  • Complement vs. modifier - choose complement when available
    • |based in Los Angeles|
      • Choose HCOMP, not HADJ_I_UNS
  • PP modifier inserted between verb and its complement NP
    • |publish in statements the names of insiders|
      • First combine |publish| with |in statements| using VMOD_I

Temporal modifiers

  • When precede VP, attach to subject NP
    • |the maker last year sold cars|
      • attach |last year| to |maker|
  • Treat as modifiers, pumping temporal NP to a PP
    • |last year|
      • Choose NPADV, not ADJN
    • |Feb. 15|
      • Combine with HSPECHC, then choose NPADV
  • Complex phrases
    • |early next year|
      • Combine |early| with |next year| using NADJ_RR

Complex compound nouns

  • Choose bracketing with intended sense
    • |luxury auto maker|
      • first combine |luxury| with |auto|
  • When intended bracketing is not clear, group from right to left
    • |airline ticket counter|
      • first combine |ticket| with |counter|

Coordination

  • Nominal phrases
    • Choose N_COORD_TOP_2, not N_COORD_TOP_3 when given the choice
  • Sentence-initial conjunction - treat as incomplete coordination of clauses
    • |But Abrams arrived early.|
      • Combine |But| with |Abrams arrived early.| with HMARK_CL

Passive verb vs. adjective

  • Choose verb if the meaning is agentive; otherwise choose adjective
    • |A date hasn't been set|
      • For |set|, choose v_np*_le, not aj_-_i_le

Punctuation

  • Paired commas marking off a modifier: choose "paired" rule (-PR suffix)
    • |Bell, based in Los Angeles|
      • Choose NADJ_RC_PR to combine modifier phrase with |Bell|

Adverbs

  • Negation - always attach |not| to preceding auxiliary if possible
    • |did not meet|
      • First combine |did| with |not| using HCOMP
  • Other adverbs between auxiliary and main VP - attach adverb to following VP
    • |can really sing|
      • First combine |really| with |sing| using ADJH_S
  • Sentence-initial - Prefer attachment without extraction when possible
    • |Apparently the commission met|
      • Choose ADJ_S, not FILLHEAD_NON_WH_IG

Measure phrases

  • Degree modifiers - combine with the number word
    • |about 25 % of them|
      • First combine |about| with |25| using HSPECHC
      • Combine |%| with |of them| using HCOMP
  • Dollar amounts - treat the symbol |$| as the head (the unit of measure)
    • |$ 80 billion|
      • Combine |$| with |80 billion| using MEAS_NP_SYMB

Quotations with explicit attribution

  • treat as extraction from 'saying' verb
    • |They arrived, Browne said.|
      • Combine |They arrived,| with |Browne said.| using FILLHEAD_NON_WH

Partitive NPs

  • First pump determiner to noun, and treat of-PP as complement
    • |some of the books|
      • Combine |some| with |of the books| using HCOMP
  • For |all|, |not all|, |both|, and |half|, treat following NP as complement
    • |not all those who wrote|
      • For |not all|, choose native entry n_np_mc-neg_le
      • Combine |not all| with |those who wrote| using HCOMP

Modification in noun phrases

  • Modifiers to the right of the head noun are always attached _before_
    • any modifiers to the left
    • |important changes by the SEC|
      • First combine |changes| with |by the SEC| using NADJ_RR
Clone this wiki locally