Skip to content
LarsJørgenSolberg edited this page Sep 11, 2012 · 36 revisions

Background

Work in progress: definition of the Grammar Markup Language (GML).

General Syntax

To not only allow nesting but partially overlapping spans (as we expect might be seen in HTML or LaTeX sources, for example), there is both an opening and matching closing tag; both carry their name, e.g

  ⌊i¦text¦i⌋

To not steel 'common' characters, GML utilizes three graphic characters, the left and right floor symbols and the broken vertical bar. These can be embedded in GML text, using the following escape conventions

  ⌊⌋⌋
  ⌊⌊⌋
  ⌊¦⌋

These conventions mean that there cannot be empty tags, nor can there be no content between the opening or closing bracket (⌊ and ⌋) and the separator (¦).

What about self closing tags (for instance images)? The following should work okay:

  ⌊img⌋?

List of Element Types

Name

GML markup

Comment

Mediawiki markup

Heading

⌊=¦text¦=⌋

level as an attribute or maybe use a html-style tag

= level1 =, == level2 ==, ... or <h1>level1</h1>, <h2>level2</h2>, ...

Link

⌊>¦text¦>⌋

do we need different types? optional target attribute?

target, <a>text</a>, or a URL

Template

⌊x¦text¦template-name¦par1¦par2¦x⌋

ditch the curly braces? normal or broken vertical bars?

{{name|arg1|arg2}}

Source code

⌊code¦text¦code⌋

<code>public static void main</code>

List

⌊1¦⌊#¦item1¦#⌋⌊#¦item2¦#⌋¦1⌋ and ⌊•¦⌊#¦foo¦#⌋⌊#¦bar¦#⌋¦•⌋ ?

numbered and unnumbered; do we need parameters?

<ul>...</ul> or <ol>...</ol>

List item

⌊#¦item¦#⌋

<li>item</li> or # item or * item

Bold

⌊*¦text¦*⌋

<b>text</b>, '''text''', <strong>text</strong>

Strike through

⌊-¦text¦-⌋

<del>text</del>, <strike>text</strike>

Tele-typed

⌊t¦text¦t⌋

<tt>text</tt>

Quote

⌊"¦text¦"⌋

<blockqute>text</blockquote>

Abbreviation

⌊.¦text¦extended term¦.⌋

<abbr title="extended term">text</abbr>

Italics

⌊/¦text¦/⌋

<i>text</i> ''text''

Underline

⌊_¦text¦_⌋

<u>text</u> or <ins>text</ins>

Superscript

⌊^¦text¦^⌋

<sup>text</sup>

Subscript

⌊,¦text¦,⌋

<sub>text</sub>

Small text

⌊↓¦text¦↓⌋ ?

<small>text</small>

Big text

⌊↑¦text¦↑⌋ ?

<big>text</big>

Paragraph

⌊p¦text¦p⌋

(?)

<p>text</p> or double newline

Definiton list

⌊:¦term¦definition¦:⌋(???)

The term is not obligatory in mediawiki, the definition-description (:) is often used to indent text.

;term or <dt>term</dt>
:definition or <dd>definition</dd>

Variable

Merge with source code ?

<var>text</var>

Math

Merge with source code ?

<math>LaTeX</math>

Citation

⌊cite¦text¦cite⌋

<cite>text</cite>

Image

⌊img⌋

what about captions?

File:image.jpg

Preformatted text

⌊pre¦text¦pre⌋

<pre>text</pre>, <poem>text</poem> or a line starting with whitespace

Ruby (and friends)

(?)

<ruby>Asian letters<rp>(</rp><rt>pronunciation hint</rt><rp>)</rp></ruby>

Div

Merge with Paragraph?

<div>text</div>

span/font/center

(?)

<span>text<span>, <font>text</font>, <center>text</center>

Forced line break

(?)

<br />

List of mediawiki markup

For tags observed in collected html, see in WeSearch/DataCollection (the only elements I can see are missing are those related to tables - which won't be included as they're outside of the scope of linguistic relevance? (XHTML spec.)

We might also want to consider the Text Encoding Initiative -- though that's an awful lot of types!

Clone this wiki locally