-
Notifications
You must be signed in to change notification settings - Fork 16
LispKit Markdown
Library (lispkit markdown)
provides an API for programmatically constructing Markdown documents, for parsing strings in Markdown format, as well as for mapping Markdown documents into corresponding HTML. The Markdown syntax supported by this library is based on the CommonMark Markdown specification.
Markdown documents are represented using an abstract syntax that is implemented by three algebraic datatypes block
, list-item
, and inline
, via define-datatype
of library (lispkit datatype)
.
At the top-level, a Markdown document consist of a list of blocks. The following recursively defined datatype shows all the supported block types as variants of type block
.
(define-datatype block markdown-block?
(document blocks)
where (markdown-blocks? blocks)
(blockquote blocks)
where (markdown-blocks? blocks)
(list-items start tight items)
where (and (opt fixnum? start) (markdown-list? items))
(paragraph text)
where (markdown-text? text)
(heading level text)
where (and (fixnum? level) (markdown-text? text))
(indented-code lines)
where (every? string? lines)
(fenced-code lang lines)
where (and (opt string? lang) (every? string? lines))
(html-block lines)
where (every? string? lines)
(reference-def label dest title)
where (and (string? label) (string? dest) (every? string? title))
(thematic-break))
(document blocks)
represents a full Markdown document consisting of a list of blocks. (blockquote blocks)
represents a blockquote block which itself has a list of sub-blocks. (list-items start tight items)
defines either a bullet list or an ordered list. start is #f
for bullet lists and defines the first item number for ordered lists. tight is a boolean which is #f
if this is a loose list (with vertical spacing between the list items). items is a list of list items of type list-item
as defined as follows:
(define-datatype list-item markdown-list-item?
(bullet ch tight? blocks)
where (and (char? ch) (markdown-blocks? blocks))
(ordered num ch tight? blocks)
where (and (fixnum? num) (char? ch) (markdown-blocks? blocks)))
The most frequent Markdown block type is a paragraph. (paragraph text)
represents a single paragraph of text where text refers to a list of inline text fragments of type inline
(see below). (heading level text)
defines a heading block for a heading of a given level, where level is a number starting with 1 (up to 6). (indented-code lines)
represents a code block consisting of a list of text lines each represented by a string. (fenced-code lang lines)
is similar: it defines a code block with code expressed in the given language lang. (html lines)
defines a HTML block consisting of the given lines of text. (reference-def label dest title)
introduces a reference definition consisting of a given label, a destination URI dest, as well as a title string. Finally, (thematic-break)
introduces a thematic break block separating the previous and following blocks visually, often via a line.
Text is represented as lists of inline text segments, each represented as an object of type inline
. inline
is defined as follows:
(define-datatype inline markdown-inline?
(text str)
where (string? str)
(code str)
where (string? str)
(emph text)
where (markdown-text? text)
(strong text)
where (markdown-text? text)
(link text uri title)
where (and (markdown-text? text) (string? uri) (string? title))
(auto-link uri)
where (string? uri)
(email-auto-link email)
where (string? uri)
(image text uri title)
where (and (markdown-text? text) (string? uri) (string? title))
(html tag)
where (string? tag)
(line-break hard?))
(text str)
refers to a text segment consisting of string str. (code str)
refers to a code string str (often displayed as verbatim text). (emph text)
represents emphasized text (often displayed as italics). (strong text)
represents text in boldface. (link text uri title)
represents a hyperlink with text linking to uri and title representing a title for the link. (auto-link uri)
is a link where uri is both the text and the destination URI. (email-auto-link email)
is a "mailto:" link to the given email address email. (image text uri title)
inserts an image at uri with image description text and image link title title. (html tag)
represents a single HTML tag of the form <
tag>
. Finally, (line-break #f)
introduces a "soft line break", whereas (line-break #t)
inserts a "hard line break".
Markdown documents can either be constructed programmatically via the datatypes introduced above, or a string representing a Markdown documents gets parsed into the internal abstract syntax representation via function markdown
.
For instance, (markdown "# My title\n\nThis is a paragraph.")
returns a markdown document consisting of two blocks: a header block for header "My title" and a paragraph block for the text "This is a paragraph":
(markdown "# My title\n\nThis is a paragraph.")
⟹ #block:(document (#block:(heading 1 (#inline:(text "My title"))) #block:(paragraph (#inline:(text "This is a paragraph.")))))
The same document can be created programmatically in the following way:
(document
(list
(heading 1 (list (text "My title")))
(paragraph (list (text "This is a paragraph.")))))
⟹ #block:(document (#block:(heading 1 (#inline:(text "My title"))) #block:(paragraph (#inline:(text "This is a paragraph.")))))
Since the abstract syntax of Markdown documents is represented via algebraic datatypes, pattern matching can be used to deconstruct the data. For instance, the following function returns all the top-level headers of a given Markdown document:
(import (lispkit datatype)) ; this is needed to import `match`
(define (top-headings doc)
(match doc
((document blocks)
(filter-map (lambda (block)
(match block
((heading 1 text) (text->raw-string text))
(else #f)))
blocks))))
An example for how top-headings
can be applied to this Markdown document:
# *header* 1
Paragraph.
# __header__ 2
## header 3
The end.
is shown here:
(top-headings (markdown "# *header* 1\nParagraph.\n# __header__ 2\n## header 3\nThe end."))
⟹ ("header 1" "header 2")
(markdown-blocks? obj) [procedure]
Returns #t
if obj is a proper list of objects o for which (markdown-block?
o
)
returns #t
; otherwise it returns #f
.
(markdown-block? obj) [procedure]
Returns #t
if obj is a variant of algebraic datatype block
.
(markdown-block=? lhs rhs) [procedure]
Returns #t
if markdown blocks lhs and rhs are equals; otherwise it returns #f
.
(markdown-list? obj) [procedure]
Returns #t
if obj is a proper list of list items i for which (markdown-list-item?
i
)
returns #t
; otherwise it returns #f
.
(markdown-list-item? obj) [procedure]
Returns #t
if obj is a variant of algebraic datatype list-item
.
(markdown-list-item=? lhs rhs) [procedure]
Returns #t
if markdown list items lhs and rhs are equals; otherwise it returns #f
.
(markdown-text? obj) [procedure]
Returns #t
if obj is a proper list of objects o for which (markdown-inline?
o
)
returns #t
; otherwise it returns #f
.
(markdown-inline? obj) [procedure]
Returns #t
if obj is a variant of algebraic datatype inline
.
(markdown-inline=? lhs rhs) [procedure]
Returns #t
if markdown inline text lhs and rhs are equals; otherwise it returns #f
.
(markdown? obj) [procedure]
Returns #t
if obj is a valid markdown document, i.e. an instance of the document
variant of datatype block
; returns #f
otherwise.
(markdown=? lhs rhs) [procedure]
Returns #t
if markdown documents lhs and rhs are equals; otherwise it returns #f
.
(markdown str) [procedure]
Parses the text in Markdown format in str and returns a representation of the abstract syntax using the algebraic datatypes block
, list-item
, and inline
.
(markdown->html md) [procedure]
Converts a Markdown document md into HTML, represented in form of a string. md needs to satisfy the markdown? predicate.
(text->string text) [procedure]
Converts given inline text text into a string representation which encodes markup in text using Markdown syntax. text needs to satisfy the markdown-text? predicate.
(text->raw-string text) [procedure]
Converts given inline text text into a string representation ignoring markup in text. text needs to satisfy the markdown-text? predicate.