Skip to content
Marius Ursache edited this page Aug 5, 2015 · 15 revisions

The documentation is work in progress, you can see the latest version in Google Docs here. This wiki is still incomplete.


What is SuperScript?

SuperScript is a dialog system + bot engine for creating human-like conversation chat bots. It exposes an expressive script for crafting dialogue and features text-expansion using WordNet and information retrieval and extraction using ConceptNet.

What comes in the box

  • Dialog engine
  • Multi-user platform for easy intergration with group chat systems
  • Message pipeline with POS tagging, sentence analysys and question tagging
  • Extensible plugin architecture
  • A built in graph database using LevelDB and each user has their own SubLevelDB.
  • ConceptNet, a feneral purpose database for knowledge extraction.
  • WordNet, a database for word and concept expansion.

How it works

The message pipeline contains many steps, and varies from other implementations. When input comes into the system we convert the input into a message object. The message object contains multiple permutations of the original object and has been analyzed for parts of speech and question classification. The message is first handled by the reasoning system, before being sent to the dialog engine for processing.

Terminology

  • Topic: the main entry point into your application. All your conversations start within a topic depending on what exactly was said the first topic could vary. SuperScript will continue to search topics until it finds a matching gambit, then replies. A topic includes one or more gambits, and each gambit includes at least one reply.
  • Gambit: a set of rules that tells the bot how to reply to a specific user action. In SuperScript, the gambit includes two parts: the input (the basis of the trigger that we match on) and one or more replies that the bot offers, either randomly or under specific conditions.
  • Trigger: a rule that analyzes the user input and tries to match specific conditions.
  • Input: the message sent by the user to the bot.
  • Reply: a specific reply given by the bot under certain conditions, and is part of a gambit. A reply can also have a series of gambits attached to them thus creating a thread of conversation.

Triggers

Triggers form the basis of all chat conversation and create a regular expression to match input on. In SuperScript, triggers are identified by + in front of them, and replies by - . The example below is a simple gambit and includes a trigger and a reply:

+ hello bot
- Hello, human!

If the user input is hello bot, the system will reply with Hello, human!.

Basic rules

Triggers are cleaned and normalized prior to being exposed to the engine. Extra whitespaces and punctuation are removed, as well matches are always case insensitive. The following rules are all identical.

+ hello bot
+ HELLO BOT
+ Hello, bot
+ hello     BOT!

Inputs are processed by the system to make them easy to understand for a bot:

  • cleaning and normalization
  • spelling corrections for common spelling errors (or British spelling)
  • idiom conversion
  • junk word removal
  • abbreviation expansion and more.

Examples:

  • It’s a nice day becomes it is a nice day.
  • Greetings such as hi, hello, hola, yo, hey become ~emohello.

Warning: normalization might remove some words from the input (like “really?” or “quite”), which might make rule triggering awkward sometimes.

Also, inputs coming into the system are burst into separate message objects and handled separately. Multiple phrases are broken based on ending punctuation AND commas following WH words. The reply of each gambit is concatenated back together to form the final message returned by the bot.

+ my name is john
- Hi John!

+ what is your name
- My name is Ava.
  • An input like My name is John, what's your name? is split into two separate inputs and normalized: my name is john and what is your name. Then the bot looks for matches for both inputs, and then concatenates the replies, before replying to the user with Hi John! My name is Ava.

Wildcards

We use wildcards to allow for more natural and expressive rules. A wildcard will match one or more characters, words or tokens. Depending on the type of wildcard, the input may or may NOT be captured or saved by the system.

Generic wildcards

Will match zero to unlimited characters, words or tokens. The input is NOT captured or saved by the system.

+ * hello *
- That is crazy.
  • Matches The dog said hello to the cat.
  • Matches Hello world.
  • Matches While hello.

Variable length wildcards

If you want to only allow a few words though, you might consider using a variable length wildcard. The syntax for variable length wildcards is *~n, where n is the maximum number of words you want to let through.

+ hello *~2
- That is crazy!
  • Matches Hello!
  • Matches Hello John!
  • Matches Hello John Doe
  • Does not match Hello John George Doe

Variable length wildcards are great for capturing input where an adjective might be slipped in before a noun.

Exact length wildcards

If you know exactly how many words you want to accept, but don't want to list what the words are, then an exact length wildcard might be what you're after.

+ example trigger
- example reply
  • Matches this rule
  • Does not match this rule
Clone this wiki locally