Tokenize

A chat message must first be tokenized into its constituent natural language constructs such as words and punctuation marks. This allows meaning to subsequently be attached to each part of the message.

Basic request

Send the following chat request:

POST /zoo-chatbot/tokenize
{
  "message": "any giraffes?"
}

Response

Notice how the message is split into two words and a question mark, with high probabilities of correctness:

Response
{
  "tokens": [
    "any",
    "giraffes",
    "?"
  ],
  "probabilities": [
    {
      "token": "any",
      "probability": 1
    },
    {
      "token": "giraffes",
      "probability": 0.9925965989
    },
    {
      "token": "?",
      "probability": 1
    }
  ]
}

Erroneous request

Chatbots must deal with typos and grammatical mistakes. Send the following request which contains such errors:

POST /zoo-chatbot/tokenize
{
  "message": "giraffes like to eat? leaves,oh"
}

Response

Notice that the probabilities of correctness of the tokens are lower for the erroneous parts of the message:

Response
{
  "tokens": [
    "giraffes",
    "like",
    "to",
    "eat",
    "?",
    "leaves",
    ",oh"
  ],
  "probabilities": [
    {
      "token": "giraffes",
      "probability": 1.0
    },
    {
      "token": "like",
      "probability": 1.0
    },
    {
      "token": "to",
      "probability": 1.0
    },
    {
      "token": "eat",
      "probability": 0.9912258614
    },
    {
      "token": "?",
      "probability": 1.0
    },
    {
      "token": "leaves",
      "probability": 0.9503636232
    },
    {
      "token": ",oh",
      "probability": 0.9828333777
    }
  ]
}