Skip to content
Marius Ursache edited this page Aug 8, 2015 · 10 revisions

IntroductionTriggersRepliesConversationsTopicsPlugins and FunctionsKnowledgeUnder the Hood


Triggers

Triggers form the basis of all chat conversation and create a regular expression to match input on. In SuperScript, triggers are identified by + in front of them, and replies by - . The example below is a simple gambit and includes a trigger and a reply:

+ hello bot
- Hello, human!

If the user input is hello bot, the system will reply with Hello, human!.

Basic rules

Triggers are cleaned and normalized prior to being exposed to the engine. Extra whitespaces and punctuation are removed, as well matches are always case insensitive. The following rules are all identical.

+ hello bot
+ HELLO BOT
+ Hello, bot
+ hello     BOT!

Inputs are processed by the system to make them easy to understand for a bot:

  • cleaning and normalization
  • spelling corrections for common spelling errors (or British spelling)
  • idiom conversion
  • junk word removal
  • abbreviation expansion and more.

Examples:

  • It’s a nice day becomes it is a nice day.
  • Greetings such as hi, hello, hola, yo, hey become ~emohello.

Warning: normalization might remove some words from the input (like “really?” or “quite”), which might make rule triggering awkward sometimes.

Also, inputs coming into the system are burst into separate message objects and handled separately. Multiple phrases are broken based on ending punctuation AND commas following WH words. The reply of each gambit is concatenated back together to form the final message returned by the bot.

+ my name is john
- Hi John!

+ what is your name
- My name is Ava.
  • An input like My name is John, what's your name? is split into two separate inputs and normalized: my name is john and what is your name. Then the bot looks for matches for both inputs, and then concatenates the replies, before replying to the user with Hi John! My name is Ava.

Wildcards

We use wildcards to allow for more natural and expressive rules. A wildcard will match one or more characters, words or tokens. Depending on the type of wildcard, the input may or may NOT be captured or saved by the system.

Generic wildcards

Will match zero to unlimited characters, words or tokens. The input is NOT captured or saved by the system.

+ * hello *
- That is crazy.
  • Matches The dog said hello to the cat.
  • Matches Hello world.
  • Matches While hello.

Exact length wildcards

If you know exactly how many words you want to accept, but don't want to list what the words are, then an exact length wildcard might be what you're after.

The syntax is ***n **, where n is the exact number of words you want to let through.

+ hello *2
  • Matches Hello John Doe
  • Does not match Hello John

The wildcards will be captured by the system and can later be used in replies.

Variable length wildcards

If you want to only allow a few words though, you might consider using a variable length wildcard. The syntax for variable length wildcards is *~n, where n is the maximum number of words you want to let through.

+ hello *~2
- That is crazy!
  • Matches Hello!
  • Matches Hello John!
  • Matches Hello John Doe
  • Does not match Hello John George Doe

Variable length wildcards are great for capturing input where an adjective might be slipped in before a noun.

Min-max wildcards

These are useful if you want to capture, let’s say, at least 2 words but no more than 4 words (the example below). Using *(n) is the equivalent of *n(exact length wildcard). The wildcards are captured by the system and can be used in replies.

+ hello *(2-4)
  • Matches Hello John Doe and Hello John Dorian Doe
  • Does not match Hello John

Alternates

Alternates are used when you have a range of words that could fit into the rule, but one is required. The alternate in the input is captured by the system and can be used in replies.

+ i go by (bus|train|car)
  • Matches I go by bus
  • Matches I go by train
  • Matches I go by car
  • Does not match I go by

Optionals

Optionals can be used to check for extra/optional words. Optionals are not captured by the system.

+ my [big] red balloon [is awesome]
  • Matches my red balloon
  • Matches my big red balloon
  • Matches my red balloon is awesome
  • Matches my big red balloon is awesome
  • Does not match my big red balloon awesome

WordNet expansion

WordNet is a database of words and ontology including hypernym, synonyms and lots of other neat relationships between words. SuperScript is using the raw WordNet library, as well it has expanded it to include fact triples and provide even more relationships, through its scripted fact graph database. These terms are expanded by using a tilde ~ before the word you want to expand.

+ I ~like ~sport
  • Matches I like hockey
  • Matches I love baseball
  • Matches I care for soccer
  • Matches I prefer lacrosse

Parts of speech

When input comes into the system, we tag it and analyze it to help make sense of what is being said. SuperScript has a few tagged keywords you can use in triggers. These tags can also have a numeric value attached to them to get even more specific.

  • <noun>, <nounN>, <nouns>
  • <adjective>, <adjectiveN>, <adjectives>
  • <verb>, <verbN>, <verbs>
  • <adverb>, <adverbN>, <adverbs>
  • <pronoun>, <pronounN>, <pronouns>
  • <name>, <nameN>, <names>
+ <name1> is [more|less] <adjectives> than <name2>
  • Matches Tom is taller than Mary
  • Matches John is less disciplined than Jack.

Note that pronouns are a subclass of nouns, so I, you, her will match both <noun> and <pronoun>. For an input like I’m an engineer, the system will normalize it to ** I am an engineer** then tag it:

taggedWords: 
  [ [ 'I', 'NN' ],
    [ 'am', 'VBP' ],
    [ 'an', 'DT' ],
    [ 'engineer', 'NN' ],

The only matches the first noun in the lookup, where matches all nouns, therefore

  • I am a will match **I am a I **
  • I am a Will match I am a I or** I am an engineer**
  • I am a Will match I am a cow or I am an engineer

Question types

We can identify questions (with or without the ending question mark) in the input, so you can create specific rules (by using ? to begin your trigger pattern in SuperScript, or selecting from the droplist in the editor.

?:Will you do *
- Hmmm, let me get back to you on that.

SuperScript can go one step further and disseminate between different question types:

  • Question word (who, what, where, when, why).
  • Choice questions (this or that)
  • Yes/No questions
  • Tag questions (He is bald, isn’t he?)
?:WH * store
  • Matches Who went to the store?
  • Matches Why did you go to the store?
  • Does not match Is this your store?
?:CH is your car *
  • Matches Is your car green or blue?
  • Does not match Is your car green?

Input types

We can also match based on the type of input. In the future, we may look at other types of classification that follow more linguistic types like speech acts and adjacency pairs.

SuperScript supports 8 broad categories and over 40 sub-categories with 80% accuracy:

ABBR           - abbreviation
  abb          - abbreviation
  exp          - expression abbreviated
ENTY           - entities
  animal       - animals
  body         - organs of body
  color        - colors
  creative     - inventions, books and other creative pieces
  currency     - currency names
  event        - events
  food         - food
  instrument   - musical instrument
  lang         - languages
  letter       - letters like a-z
  other        - other entities
  plant        - plants
  product      - products
  religion     - religions
  sport        - sports
  substance    - elements and substances
  symbol       - symbols and signs
  technique    - techniques and methods
  term         - equivalent terms
  vehicle      - vehicles
  word         - words with a special property
DESC           - description and abstract concepts
  def          - definition of sth.
  desc         - description of sth.
  manner       - manner of an action
  reason       - reasons
HUM            - human beings
  group        - a group or organization of persons
  ind          - an individual
  title        - title of a person
  desc         - description of a person
LOC            - locations
  city         - cities
  country      - countries
  mountain     - mountains
  other        - other locations
  state        - states
NUM            - numeric values
  code         - postcodes, phone number or other codes
  count        - number of sth.
  expression   - numeric mathmatical expression
  date         - dates
  distance     - linear measures
  money        - prices
  order        - ranks
  other        - other numbers
  period       - the lasting time of sth.
  percent      - fractions
  speed        - speed
  temp         - temperature
  size         - size, area and volume
  weight       - weight

Here are some examples:

?:NUM:code * phone *
  • Matches My phone is 415-315 9862.

Input types are different from concepts or parts of speech because they are made up of more than one word. LOC, for example, usually starts with “Where” then drills into a region or other complementary word.

Clone this wiki locally