How do chatbots work?

August 24, 2024 Nieves Ábalos

What is a chatbot?

One of the main applications of Conversational AI are chatbots. A chatbot is a computer program designed to simulate a conversation between people through text, and aims to help the person perform certain tasks, socialize, or obtain information.

What was the first chatbot?

The first chatbot considered as such was ELIZA[1]. It was created by Joseph Weizenbaum in 1966, and used a simple word detection mechanism to respond to its users, simulating realistic conversations, as I explained in this article.

Example of chat with ELIZA. Source: Weizenbaum[1]

Command and button-based chatbots

The first interfaces considered as chatbots were command-based. A written command can be recognized through a regular expression, and once detected, the corresponding response or action is executed. In reality, this type of chatbot is more of a "bot" than a "chat-bot", due to its limited capabilities in understanding language as such.

Example of a command-based chatbot on Telegram. Source: Telegram documentation.

During the chatbot boom from 2016 to 2018, the focus was more on the chat interface than on intelligent natural language processing capabilities. In fact, not all available chatbot creation tools had linguistic capabilities, so many designs were still based on buttons, which were associated with commands or actions that were executed when the user pressed the button. Examples of these platforms were Chatfuel or Landbot.

The concept discussed at that time was "Conversational UI", referring to user interface, rather than the current concept of "Conversational AI". An example was the iOS news application called Quartz[2], launched in 2016. Although not all chatbots had their own mobile application, the usual channels where you could find these chatbots were Facebook Messenger, WeChat, LINE, Slack, and Telegram.

Example of interaction with a button-based chatbot. Source: Dennis Snellenberg (Dribble)

Can I help you with this topic? In addition to my Conversational AI consulting services, I offer training, talks, and mentoring.

Chatbots that converse in natural language

Thanks to advances in natural language processing (NLP) in certain languages such as English or Spanish, chatbots are capable of understanding expressions written in natural language, making the experience more intuitive and, a priori, natural, interpreting the user's needs and responding coherently.

In 2016, the startup api.ai, founded two years earlier, was acquired by Google and would later be renamed Dialogflow[3]. This key tool in chatbot development allows the creation of agents that understand natural language, using NLP techniques for intent classification and entity extraction, which were the most advanced of the time.

Currently, language models have represented an advance in tasks such as language comprehension and language generation.

Let's see below how chatbots are able to understand us when we interact with them.

How does the communication process work?

When we talk to a chatbot, several processes or steps are set in motion that allow the machine to understand and respond to what we say, through a series of turns, simulating a conversation between two users.

A conversation between people. Source: own elaboration with Midjourney v6.

Let's examine this process in detail, distinguishing between traditional approaches and more recent end-to-end (E2E) approaches based on Large Language Models (LLMs).

NLU approach: the traditional communication process

In the traditional approach, the text communication process is typically divided into three phases or modules in which different NLP or machine learning (ML) techniques are applied. This process is commonly found in multiple chatbot creation tools such as Dialogflow[3] or Microsoft LUIS [4]:

1. Natural Language Understanding or NLU: Interprets the meaning of what the user has written, that is, their intention and keywords from the text.

Extracts the user's intention: attempts to classify the received phrase into known intentions, which group phrases with which the chatbot has been trained. It uses supervised Machine Learning algorithms.
Extracts relevant keywords or entities using techniques such as phrase part tagging or dictionaries, or with rule-based NER (Named Entity Recognition), and linguistic corpora.
For example, in "What's the weather like in Madrid today?", the intention could be "weather_query" and the extracted entity would be "Madrid".

2. Dialogue Management or DM: Determines the most appropriate action based on the conversation context: which turn we are in, what the user's intention is, what information has been provided previously, and what information is known through other channels or databases.

It can use rule-based systems or machine learning models. It is necessary to design system behaviors according to existing intentions.
Contains policies, in which it is decided, for example, if more information is needed, because it is missing or ambiguous, or if a response can be given. An example of a policy is slot-filling, a task in which the necessary information to complete an action will be requested.
Contains mechanisms to maintain context and track the state of the conversation.
Has the ability to connect with external sources: databases, applications, and devices through APIs or other mechanisms.
In case it is unable to help because the request is outside the intentions it knows (out of domain), a "fallback" response will be given along the lines of "I'm sorry, I can't help you with that...".

3. Response Generation or NLG, or more commonly, RG: Creates an appropriate response in natural language, based on the action determined in the previous step.

It can use predefined response templates or more advanced text generation techniques. In the case of predefined responses, one or several responses will have to be designed for each type of situation and/or chatbot behavior.

The traditional architecture of communication process in chatbots: NLU, DM and RG. Source: own elaboration.

Thus, chatbots that use this traditional process and these steps have the following characteristics:

Each component is developed and optimized separately.
It allows more precise control over each module or step of the process.
It can require a lot of manual work, especially in classifying phrases into intentions, creating rules, and managing dialogue.
It can be less flexible in handling new phrases, unexpected inputs, or out-of-domain inputs.

The E2E communication process based on LLMs

Since the appearance of ChatGPT[5] in 2022, large language models (LLMs) are beginning to be applied in chatbot development at academic and business levels.

One of the main approaches consists of replacing three of the blocks, NLU, DM, and RG, with an LLM, adopting a more unified approach, in which they perform the communication process in one go, or end-to-end (E2E).

These models, whether adapted for dialogue or not, as was the case with GPT3.5 Turbo (with RLHF), are capable of generating a plausible response to any user interaction or turn.

Does this mean that every LLM is a chatbot? Not exactly. I'll explain this in an upcoming article.

GPT-3.5 Turbo documentation. Source: OpenAI.

In a simplified manner, here are some of the processes that the LLM performs to give a response in natural language:

Tokenization
- Converts the user's phrase or input (in this case, it will be text) into tokens that the model can process.
Processing in the LLM
- The model processes the input along with the conversation context, if added, and always fitting within the context window.
- It performs understanding, action, and response generation in a single step.
Decoding
- Converts the model's output back into readable text.
Post-processing (optional)
- The process may include security filters, specific formatting, or integration with external systems.
- For example, it may be necessary to use a system based on RAG (Retrieval Augmented Generation) to obtain information from external knowledge sources.

The communication process in chatbots with end-to-end LLMs. Source: own elaboration.

Thus, chatbots that use LLM-based models as E2E systems have the following characteristics:

More unified and simple approach.
Greater flexibility to handle a wide range of inputs and tasks.
Can generate more natural and contextual responses.
Requires less manual design of rules and dialogue flows.
Can be more difficult to control and may generate hallucinations, unexpected or incorrect responses.

Comparison between both approaches in chatbots

Between the two approaches, the traditional and the end-to-end with LLM, there are clear differences that I summarize here:

Response Generation (NLG/RG):
- The LLM always gives an answer.
- The LLM provides varied and different responses each time it receives a request.
- The LLM has limitations regarding hallucinations, outdated information, unverified (not truthful), biased, etc.
- The traditional approach requires the creation of customized responses for each situation, being a tedious task that can lead to an unnatural experience.
Dialogue Management (DM):
- The LLM can incorporate the conversation context to remember what was previously discussed and give more contextual responses. It may have as a limit the size of the context window.
- The LLM incorporates RAG to give more reliable and personalized responses.
- The traditional approach needs custom techniques to remember the context of what was discussed.
- The traditional approach contains behaviors based on rules or ML, these being defined individually, thus limiting the possible responses of the chatbot.
Understanding of user intent (NLU):
- The LLM does not require the creation of intents or example phrases to understand the user.
- The traditional approach requires work in creating intents to understand users. It's a tedious job, and this approach is not scalable, because when a chatbot has more than dozens of intents, it becomes complicated and more difficult to update.
Flexibility: LLMs are generally more flexible than the traditional approach, and can handle a wider range of inputs and tasks without the need for specific training for each domain.
Control: The traditional approach offers more control over each stage of the process, something key in the industry, with chatbots that require high reliability in responses, or where errors can have serious consequences.
Contextualization: LLMs tend to be better at maintaining context throughout a conversation and generating more natural and contextualized responses.
Knowledge: LLMs have a wide general knowledge incorporated, while chatbots with a traditional approach depend more on domain-specific knowledge bases.
Computational requirements: LLMs usually require many more computational resources, especially for large models. The economic cost, and the cost of resources (electricity, water, etc.) associated with this type of technology is high. This is not the case in chatbots with a traditional approach.
Interpretability: Chatbots with a traditional approach tend to be more interpretable, as each stage of the process is isolated and can be analyzed separately.

The future, a hybrid communication process?

In the industry, modern chatbots are beginning to adopt a hybrid approach, combining the strengths of both models, to achieve a satisfactory experience for the user, or for the company or business that makes this chatbot available to the user. Thus, it tries to avoid trust problems in situations of hallucinations and untruthful data in LLMs.

🗣 I would like to develop this hybrid approach in an article, leave me a comment if you're interested in the topic.

See this form in the original post

References

[1] Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM, 9(1), 36-45. https://doi.org/10.1145/365153.365168

[2] Quartz (2016) https://qz.com/613700/its-here-quartzs-first-news-app-for-iphone

[3] Google, Dialogflow https://cloud.google.com/dialogflow?hl=es-419

[4] Microsoft, LUIS https://www.luis.ai/

[5] OpenAI, O. (30/11/2022). Introducing ChatGPT. https://openai.com/index/chatgpt

See this gallery in the original post