Skip to content

Commit 2770902

Browse files
authored
blog: Four Pillars of Real-Time Gen AI (#848)
1 parent 3d54046 commit 2770902

File tree

5 files changed

+135
-3
lines changed

5 files changed

+135
-3
lines changed

python/Cargo.lock

Lines changed: 17 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

python/docs/_static/images/blog/four_pillars_of_realtime_genai.svg

Lines changed: 1 addition & 0 deletions
Loading

python/docs/blog/_metadata.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
format:
22
html:
33
link-external-icon: true
4-
link-external-newwindow: true
4+
link-external-newwindow: true
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
date: 2023-Nov-06
3+
author: Ben Chambers
4+
categories:
5+
- real-time AI
6+
- GenAI
7+
title: "Always On: Four Pillars of Real-Time Generative AI"
8+
---
9+
10+
With the rise of Large-Language Models (LLMs), it is now possible for everyone to create AI-powered applications and agents.
11+
Starting with foundation models, developers can rapidly prototype the application and then jump into advanced techniques by tuning the prompt or introducing Retrieval-Augmented Generation (RAG), Few-Shot Learning, and Fine-Tuning, all without having to dive deep into machine learning and model learning.
12+
While current LLM uses often revolve around a few simple applications – chatbots, content summarization, etc. – I see a vast potential in real-time applications.
13+
14+
Imagine a world where LLMs are constantly watching the world around you and looking out for your
15+
interests.
16+
Instead of being interrupted by emails or chats and having to read, filter, and construct responses, AI-powered agents constantly triage and take appropriate actions.
17+
Some messages will receive generated responses, some will be summarized for later review, and only the time-sensitive ones will actually interrupt you.
18+
For example, [BeepGPT](https://epinzur.github.io/posts/001-openai-fine-tuning-1/post.html) is an agent that automatically listens to messages on Slack and pings you, if appropriate based on learned interests. Such agents are __*always on*__ and tapped into the firehose of events happening around you.
19+
20+
Throughout this article, I’ll use the example of an agent assisting a travel booking agency. The agency will work with customers via email and an online booking portal.
21+
The core ideas can be generalized across industries and applications.
22+
23+
This article delves deep into four pillars of real-time AI – things your application will need to do to provide active, reactive and proactive experiences powered by LLMs.
24+
I’ll talk more about each of these pillars below.
25+
26+
![Four Pillars of Real-Time Generative AI](/_static/images/blog/four_pillars_of_realtime_genai.svg)
27+
28+
## Watch
29+
To be truly real-time, an AI agent must connect with events as they happen. In our email booking scenario, the application should detect incoming emails – possibly implementing an SMTP service or integrating with the e-mail application. It must also observe bookings made via the web portal so that it has up-to-date availability information for both the e-mail service and the web bookings.
30+
31+
The source of the relevant events likely vary, and include both structured and unstructured events:
32+
33+
- Internal events might be available from a streaming platform, like Apache Kafka.
34+
- Public APIs may provide events via websocket or webhook handlers, or require some form of scraping.
35+
- Other events may be generated within the application as part of other workflows, for example when a user confirms a booking via email.
36+
37+
Often, there are a variety of events of different kinds coming from different sources, all of which need to be combined for the real-time agent to understand what is happening and respond to it – in the example, we need to use events from email and the web application.
38+
39+
Many existing LLM applications respond to specific user requests.
40+
In contrast, Real-Time AI needs to be “always on” – observing things that happen and ready to jump in with assistance.
41+
This starts with the ability to watch the events.
42+
43+
## Trigger
44+
The essence of a real-time application – and AI agents are no exception – lies in discerning when to act.
45+
An email asking about tour availability demands a prompt answer, while a promotion from a local store does not need any action.
46+
47+
Implementing the behaviors and applying them in response to the right kinds of events is where you, as the creator of the real-time AI application, get to apply your experience.
48+
Not all events necessarily merit an LLM invocation – for instance, when someone books online, you can update the remaining slots in the tour directly.
49+
Similarly, if an email is classified as spam you may not need to do anything with it.
50+
Using “intelligent” triggers and behaviors allows you to ensure the time (and possibly cost) of applying LLM is “worth it” for the value provided.
51+
52+
While triggering, you may also use other models – to decide whether (and what) to do.
53+
For instance, you may use an LLM as a “router” to decide which actions to take, or a more traditional classifier model.
54+
55+
When building a Real-Time AI application, you need to identify which events matter and what behaviors to take in response to each of those events.
56+
The behaviors likely include computations over the events as well as LLM invocations.
57+
58+
## Act
59+
The range of actions available to the real-time AI agent is paramount to providing amazing experiences.
60+
Agents will likely require a variety of actions – applying an LLM, updating values within a computation, interacting with external systems, etc.
61+
In fact, often there may be an entire process initiated in response to an event or request.
62+
63+
For our travel booking agent, we may want to use the LLM to suggest a response to an email as well as actually sending that response.
64+
When online bookings are made, we need to update the availability of the tours so that the latest information is reflected in future email and online responses.
65+
66+
Depending on the application, we may want to limit the actions automatically taken by the agent based on the user configuration.
67+
For instance, only sending email responses to questions that I have told it to handle, rather than automatically handling everything.
68+
69+
The set of actions and when (and how) to __include the human in the loop__ depends on the real-time application.
70+
Seamlessly including the operator in the loop allows for a much wider range of actions to be taken.
71+
While I may not (yet) trust the agent to automatically book my trip, if it summarizes the suggestions and then asks for confirmation, that would be great!
72+
73+
Needless to say, implementing actions and the appropriate human involvement is another place where your judgment will be needed when creating real-time AI applications.
74+
75+
## Learn
76+
A real-time agent continues to improve over-time. There are a variety of ways to learn and improve the quality of responses:
77+
78+
RAG
79+
: Add information to a vector database and provide that to the LLM for richer, more precise responses.
80+
For instance, load information describing the tours to answer questions like “how long is the tour” and “what visas do I need to go on the tour”.
81+
82+
Prompt tuning
83+
: As you develop your application, you’ll likely want to explore variations in how you use LLMs.
84+
Changes to the prompt and how the LLM is used (for instance, asking the LLM to explain the steps, as in chain-of-thought reasoning) often lead to higher quality answers.
85+
86+
Instructions / Few-Shot Learning
87+
: Some behaviors may benefit from providing some few-shot learning (examples) of how we’d like the LLM to respond.
88+
For instance, this may be used to manage the tone of the responses being generated, etc.
89+
90+
Fine-Tuning
91+
: For some applications, creating a fine-tuned LLM for the specific questions being asked may lead to significant improvements in response quality.
92+
93+
These learning techniques can broadly be divided into those that can be automated or scripted – such as adding new tours to the vector database – and those which require changes to the application – such as changing from a single prompt to chain-of-thought.
94+
While useful, this distinction between automated and manual improvements can be blurry – for instance, you could automate the collection of new training examples and fine-tuning of a new generation of model.
95+
96+
It will be important to identify the metrics you use for quality of responses.
97+
While LLMs have been shown to be effective at evaluating the quality of their own output, the quality of real-time agents should be measured in terms of how helpful the overall agent is, not just the response the LLM provided within an individual action.
98+
User feedback is a valuable metric, especially for identifying what parts of the real-time application delight them and which parts left them confused.
99+
100+
At the end of the day, while foundation models make it really easy to get started with an application using generative AI, you will likely find a point where you do need to look at improving the model itself.
101+
This shouldn’t be avoided – it means that your application has reached a point where what matters is the quality of the model responses, and you have opportunities to improve that!
102+
103+
## Conclusion
104+
These four pillars are critical to creating amazing real-time AI agents:
105+
106+
1. Watching what happens in real-time, so the agent can respond to and be aware of what is happening.
107+
2. Triggering actions when appropriate.
108+
3. Acting in response to what is happening.
109+
4. Learning from what has happened and the results of those actions.
110+
111+
This list should serve as a good starting point when evaluating technologies to use when building these real-time applications.
112+
113+
What real-time applications can you envision with the rise of LLMs?
114+
Discuss this post and the potential for Real-Time Agents in the [GitHub discussion thread]()!
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
title-block-style: "default"
2+
title-block-banner: "#c3c8cd"

0 commit comments

Comments
 (0)