|
| 1 | +--- |
| 2 | +date: 2023-Nov-06 |
| 3 | +author: Ben Chambers |
| 4 | +categories: |
| 5 | + - real-time AI |
| 6 | + - GenAI |
| 7 | +title: "Always On: Four Pillars of Real-Time Generative AI" |
| 8 | +--- |
| 9 | + |
| 10 | +With the rise of Large-Language Models (LLMs), it is now possible for everyone to create AI-powered applications and agents. |
| 11 | +Starting with foundation models, developers can rapidly prototype the application and then jump into advanced techniques by tuning the prompt or introducing Retrieval-Augmented Generation (RAG), Few-Shot Learning, and Fine-Tuning, all without having to dive deep into machine learning and model learning. |
| 12 | +While current LLM uses often revolve around a few simple applications – chatbots, content summarization, etc. – I see a vast potential in real-time applications. |
| 13 | + |
| 14 | +Imagine a world where LLMs are constantly watching the world around you and looking out for your |
| 15 | +interests. |
| 16 | +Instead of being interrupted by emails or chats and having to read, filter, and construct responses, AI-powered agents constantly triage and take appropriate actions. |
| 17 | +Some messages will receive generated responses, some will be summarized for later review, and only the time-sensitive ones will actually interrupt you. |
| 18 | +For example, [BeepGPT](https://epinzur.github.io/posts/001-openai-fine-tuning-1/post.html) is an agent that automatically listens to messages on Slack and pings you, if appropriate based on learned interests. Such agents are __*always on*__ and tapped into the firehose of events happening around you. |
| 19 | + |
| 20 | +Throughout this article, I’ll use the example of an agent assisting a travel booking agency. The agency will work with customers via email and an online booking portal. |
| 21 | +The core ideas can be generalized across industries and applications. |
| 22 | + |
| 23 | +This article delves deep into four pillars of real-time AI – things your application will need to do to provide active, reactive and proactive experiences powered by LLMs. |
| 24 | +I’ll talk more about each of these pillars below. |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## Watch |
| 29 | +To be truly real-time, an AI agent must connect with events as they happen. In our email booking scenario, the application should detect incoming emails – possibly implementing an SMTP service or integrating with the e-mail application. It must also observe bookings made via the web portal so that it has up-to-date availability information for both the e-mail service and the web bookings. |
| 30 | + |
| 31 | +The source of the relevant events likely vary, and include both structured and unstructured events: |
| 32 | + |
| 33 | +- Internal events might be available from a streaming platform, like Apache Kafka. |
| 34 | +- Public APIs may provide events via websocket or webhook handlers, or require some form of scraping. |
| 35 | +- Other events may be generated within the application as part of other workflows, for example when a user confirms a booking via email. |
| 36 | + |
| 37 | +Often, there are a variety of events of different kinds coming from different sources, all of which need to be combined for the real-time agent to understand what is happening and respond to it – in the example, we need to use events from email and the web application. |
| 38 | + |
| 39 | +Many existing LLM applications respond to specific user requests. |
| 40 | +In contrast, Real-Time AI needs to be “always on” – observing things that happen and ready to jump in with assistance. |
| 41 | +This starts with the ability to watch the events. |
| 42 | + |
| 43 | +## Trigger |
| 44 | +The essence of a real-time application – and AI agents are no exception – lies in discerning when to act. |
| 45 | +An email asking about tour availability demands a prompt answer, while a promotion from a local store does not need any action. |
| 46 | + |
| 47 | +Implementing the behaviors and applying them in response to the right kinds of events is where you, as the creator of the real-time AI application, get to apply your experience. |
| 48 | +Not all events necessarily merit an LLM invocation – for instance, when someone books online, you can update the remaining slots in the tour directly. |
| 49 | +Similarly, if an email is classified as spam you may not need to do anything with it. |
| 50 | +Using “intelligent” triggers and behaviors allows you to ensure the time (and possibly cost) of applying LLM is “worth it” for the value provided. |
| 51 | + |
| 52 | +While triggering, you may also use other models – to decide whether (and what) to do. |
| 53 | +For instance, you may use an LLM as a “router” to decide which actions to take, or a more traditional classifier model. |
| 54 | + |
| 55 | +When building a Real-Time AI application, you need to identify which events matter and what behaviors to take in response to each of those events. |
| 56 | +The behaviors likely include computations over the events as well as LLM invocations. |
| 57 | + |
| 58 | +## Act |
| 59 | +The range of actions available to the real-time AI agent is paramount to providing amazing experiences. |
| 60 | +Agents will likely require a variety of actions – applying an LLM, updating values within a computation, interacting with external systems, etc. |
| 61 | +In fact, often there may be an entire process initiated in response to an event or request. |
| 62 | + |
| 63 | +For our travel booking agent, we may want to use the LLM to suggest a response to an email as well as actually sending that response. |
| 64 | +When online bookings are made, we need to update the availability of the tours so that the latest information is reflected in future email and online responses. |
| 65 | + |
| 66 | +Depending on the application, we may want to limit the actions automatically taken by the agent based on the user configuration. |
| 67 | +For instance, only sending email responses to questions that I have told it to handle, rather than automatically handling everything. |
| 68 | + |
| 69 | +The set of actions and when (and how) to __include the human in the loop__ depends on the real-time application. |
| 70 | +Seamlessly including the operator in the loop allows for a much wider range of actions to be taken. |
| 71 | +While I may not (yet) trust the agent to automatically book my trip, if it summarizes the suggestions and then asks for confirmation, that would be great! |
| 72 | + |
| 73 | +Needless to say, implementing actions and the appropriate human involvement is another place where your judgment will be needed when creating real-time AI applications. |
| 74 | + |
| 75 | +## Learn |
| 76 | +A real-time agent continues to improve over-time. There are a variety of ways to learn and improve the quality of responses: |
| 77 | + |
| 78 | +RAG |
| 79 | +: Add information to a vector database and provide that to the LLM for richer, more precise responses. |
| 80 | + For instance, load information describing the tours to answer questions like “how long is the tour” and “what visas do I need to go on the tour”. |
| 81 | + |
| 82 | +Prompt tuning |
| 83 | +: As you develop your application, you’ll likely want to explore variations in how you use LLMs. |
| 84 | + Changes to the prompt and how the LLM is used (for instance, asking the LLM to explain the steps, as in chain-of-thought reasoning) often lead to higher quality answers. |
| 85 | + |
| 86 | +Instructions / Few-Shot Learning |
| 87 | +: Some behaviors may benefit from providing some few-shot learning (examples) of how we’d like the LLM to respond. |
| 88 | + For instance, this may be used to manage the tone of the responses being generated, etc. |
| 89 | + |
| 90 | +Fine-Tuning |
| 91 | +: For some applications, creating a fine-tuned LLM for the specific questions being asked may lead to significant improvements in response quality. |
| 92 | + |
| 93 | +These learning techniques can broadly be divided into those that can be automated or scripted – such as adding new tours to the vector database – and those which require changes to the application – such as changing from a single prompt to chain-of-thought. |
| 94 | +While useful, this distinction between automated and manual improvements can be blurry – for instance, you could automate the collection of new training examples and fine-tuning of a new generation of model. |
| 95 | + |
| 96 | +It will be important to identify the metrics you use for quality of responses. |
| 97 | +While LLMs have been shown to be effective at evaluating the quality of their own output, the quality of real-time agents should be measured in terms of how helpful the overall agent is, not just the response the LLM provided within an individual action. |
| 98 | +User feedback is a valuable metric, especially for identifying what parts of the real-time application delight them and which parts left them confused. |
| 99 | + |
| 100 | +At the end of the day, while foundation models make it really easy to get started with an application using generative AI, you will likely find a point where you do need to look at improving the model itself. |
| 101 | +This shouldn’t be avoided – it means that your application has reached a point where what matters is the quality of the model responses, and you have opportunities to improve that! |
| 102 | + |
| 103 | +## Conclusion |
| 104 | +These four pillars are critical to creating amazing real-time AI agents: |
| 105 | + |
| 106 | +1. Watching what happens in real-time, so the agent can respond to and be aware of what is happening. |
| 107 | +2. Triggering actions when appropriate. |
| 108 | +3. Acting in response to what is happening. |
| 109 | +4. Learning from what has happened and the results of those actions. |
| 110 | + |
| 111 | +This list should serve as a good starting point when evaluating technologies to use when building these real-time applications. |
| 112 | + |
| 113 | +What real-time applications can you envision with the rise of LLMs? |
| 114 | +Discuss this post and the potential for Real-Time Agents in the [GitHub discussion thread]()! |
0 commit comments