Skip to content

Commit 6e3b334

Browse files
Update learn section of genai_cookbook site to Agents (#33)
* Update learn section of cookbook to Agents Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * link to db docs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * agent with tool Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * rag to agents Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
1 parent 4669f46 commit 6e3b334

14 files changed

+82
-69
lines changed

genai_cookbook/_toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ parts:
1010
- caption: "Learn"
1111
numbered: true
1212
chapters:
13-
- file: nbs/1-introduction-to-rag
13+
- file: nbs/1-introduction-to-agents
1414
- file: nbs/2-fundamentals-unstructured
1515
sections:
1616
- file: nbs/2-fundamentals-unstructured-data-pipeline

genai_cookbook/index-2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ The RAG cookbook is divided into 2 sections:
5757
## Table of contents
5858
<!--
5959
**Table of contents**
60-
1. [RAG overview](./nbs/1-introduction-to-rag): Understand how RAG works at a high-level
60+
1. [RAG overview](./nbs/1-introduction-to-agents): Understand how RAG works at a high-level
6161
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Understand the key components in a RAG app
6262
3. [RAG quality knobs](./nbs/3-deep-dive): Understand the knobs Databricks recommends tuning improve RAG app quality
6363
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling

genai_cookbook/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ The RAG cookbook is divided into 2 sections:
5858
## Table of contents
5959
<!--
6060
**Table of contents**
61-
1. [RAG overview](./nbs/1-introduction-to-rag): Understand how RAG works at a high-level
61+
1. [RAG overview](./nbs/1-introduction-to-agents): Understand how RAG works at a high-level
6262
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Understand the key components in a RAG app
6363
3. [RAG quality knobs](./nbs/3-deep-dive): Understand the knobs Databricks recommends tuning improve RAG app quality
6464
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling

genai_cookbook/nbs/1-introduction-to-rag.md renamed to genai_cookbook/nbs/1-introduction-to-agents.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
1-
# RAG overview
1+
# Agents overview
22

3-
This section provides an overview of Retrieval-augmented generation (RAG): what it is, how it works, and key concepts.
3+
This section provides an overview of Agents: what it is, how it works, and key concepts.
4+
5+
## What are AI agents and tools?
6+
7+
AI agents are systems where models make decisions, often using tools like Databricks' Unity Catalog functions toperform tasks such as retrieving data or interacting with external services.
8+
9+
See Databricks docs ([AWS](https://docs.databricks.com/en/generative-ai/ai-agents.html)|[Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/ai-agents)) for more info.
410

511
## What is retrieval-augmented generation?
612

@@ -10,28 +16,34 @@ For example, suppose you are building a question-and-answer chatbot to help empl
1016

1117
RAG addresses this issue by first retrieving relevant information from the company documents based on a user’s query, and then providing the retrieved information to the LLM as additional context. This allows the LLM to generate a more accurate response by drawing from the specific details found in the relevant documents. In essence, RAG enables the LLM to “consult” the retrieved information to formulate its answer.
1218

13-
## Core components of a RAG application
19+
An agent with a retriever tool is one pattern for RAG, and has the advantage of deciding when to it needs to perform retrieval. This cookbook will describe how to build such an agent.
1420

15-
A RAG application is an example of a [compound AI system](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/): it expands on the language capabilities of the model alone by combining it with other tools and procedures.
21+
## Core components of an agent application
22+
23+
An agent application is an example of a [compound AI system](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/): it expands on the language capabilities of the model alone by combining it with other tools and procedures.
1624

1725
When using a standalone LLM, a user submits a request, such as a question, to the LLM, and the LLM responds with an answer based solely on its training data.
1826

19-
In its most basic form, the following steps happen in a RAG application:
27+
In its most basic form, the following steps happen in an agent application:
28+
29+
1. **User query understanding**: First the agent needs to use an LLM to understand the user's query. This step may also consider the previous steps of the conversation if provided.
30+
31+
2. **Tool selection**: The agent will use an LLM to determine if it should use a retriever tool. In the case of a vector search retriever, the LLM will create a retriever query, which will help retriever relevant chunks from the vector database. If no tool is selected, the agent will skip to step 4 and generate the final response.
2032

21-
1. **Retrieval:** The **user's request** is used to query some outside source of information. This might mean querying a vector store, conducting a keyword search over some text, or querying a SQL database. The goal of the retrieval step is to obtain **supporting data** that will help the LLM provide a useful response.
33+
3. **Tool execution**: The agent will then execute the tool with the parameters determined by the LLM and return the output.
2234

23-
2. **Augmentation:** The **supporting data** from the retrieval step is combined with the **user's request**, often using a template with additional formatting and instructions to the LLM, to create a **prompt**.
35+
4. **LLM Generation**: The LLM will then generate the final response.
2436

25-
3. **Generation:** The resulting **prompt** is passed to the LLM, and the LLM generates a response to the **user's request**.
37+
The image below demonstrates a RAG agent where a retrieval tool is selected.
2638

27-
```{image} ../images/1-introduction-to-rag/1_img.png
39+
```{image} ../images/1-introduction-to-agents/1_img.png
2840
:alt: RAG process
2941
:align: center
3042
```
3143

3244
<br>
3345

34-
This is a simplified overview of the RAG process, but it's important to note that implementing a RAG application involves a number of complex tasks. Preprocessing source data to make it suitable for use in RAG, effectively retrieving data, formatting the augmented prompt, and evaluating the generated responses all require careful consideration and effort. These topics will be covered in greater detail in later sections of this guide.
46+
This is a simplified overview of the RAG process, but it's important to note that implementing an agent application involves a number of complex tasks. Preprocessing source data to make it suitable for retrieval, formatting the augmented prompt, and evaluating the generated responses all require careful consideration and effort. These topics will be covered in greater detail in later sections of this guide.
3547

3648
## Why use RAG?
3749

@@ -53,4 +65,4 @@ The RAG architecture can work with 2 types of **supporting data**:
5365
| **Definition** | Tabular data arranged in rows & columns with a specific schema e.g., tables in a database. | Data without a specific structure or organization, e.g., documents that include text and images or multimedia content such as audio or videos. |
5466
| **Example data sources** | - Customer records in a BI or Data Warehouse system<br>- Transaction data from a SQL database<br>- Data from application APIs (e.g., SAP, Salesforce, etc) | - PDFs<br>- Google/Office documents<br>- Wikis<br>- Images<br>- Videos |
5567

56-
Which data you use with RAG depends on your use case. The remainder of this guide focuses on RAG for unstructured data.
68+
Which data you use for your retriever depends on your use case. The remainder of this guide focuses on agents that use a retriever tool for unstructured data.
Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,21 @@
1-
## Retrieval, augmentation, and generation (aka RAG Chain)
1+
## Retrieval, augmentation, and generation (aka RAG Agent)
22

3-
Once the data has been processed by the data pipeline, it is suitable for use in the RAG application. This section describes the process that occurs once the user submits a request to the RAG application in an online setting. The series, or *chain* of steps that are invoked at inference time is commonly referred to as the RAG chain.
3+
Once the data has been processed by the data pipeline, it is suitable for use in a retriever tool. This section describes the process that occurs once the user submits a request to the agent application in an online setting.
44

5+
<!-- TODO (prithvi): add this back in once updated to agents
56
```{image} ../images/2-fundamentals-unstructured/3_img.png
67
:align: center
7-
```
8+
``` -->
89
<br/>
910

10-
1. **(Optional) User query preprocessing:** In some cases, the user's query is preprocessed to make it more suitable for querying the vector database. This can involve formatting the query within a template, using another model to rewrite the request, or extracting keywords to aid retrieval. The output of this step is a *retrieval query* which will be used in the subsequent retrieval step.
11+
1. **User query understanding**: First the agent needs to use an LLM to understand the user's query. This step may also consider the previous steps of the conversation if provided.
1112

12-
2. **Retrieval:** To retrieve supporting information from the vector database, the retrieval query is translated into an embedding using *the same embedding model* that was used to embed the document chunks during data preparation. These embeddings enable comparison of the semantic similarity between the retrieval query and the unstructured text chunks, using measures like cosine similarity. Next, chunks are retrieved from the vector database and ranked based on how similar they are to the embedded request. The top (most similar) results are returned.
13+
2. **Tool selection**: The agent will use an LLM to determine if it should use a retriever tool. In the case of a vector search retriever, the LLM will create a retriever query, which will help retriever relevant chunks from the vector database. If no tool is selected, the agent will skip to step 4 and generate the final response.
1314

14-
3. **Prompt augmentation:** The prompt that will be sent to the LLM is formed by augmenting the user's query with the retrieved context, in a template that instructs the model how to use each component, often with additional instructions to control the response format. The process of iterating on the right prompt template to use is referred to as [prompt engineering](https://en.wikipedia.org/wiki/Prompt_engineering).
15+
3. **Tool execution**: The agent will then execute the tool with the parameters determined by the LLM and return the output.
1516

16-
4. **LLM Generation**: The LLM takes the augmented prompt, which includes the user's query and retrieved supporting data, as input. It then generates a response that is grounded on the additional context.
17+
4. **LLM Generation**: The LLM will then generate the final response.
1718

18-
5. **(Optional) Post-processing:** The LLM's response may be processed further to apply additional business logic, add citations, or otherwise refine the generated text based on predefined rules or constraints.
19+
As with the retriever data pipeline, there are numerous consequential engineering decisions that can affect the quality of the agent. For example, determining how many chunks to retrieve in and when to select the retriever tool can both significantly impact the model's ability to generate quality responses.
1920

20-
As with the RAG application data pipeline, there are numerous consequential engineering decisions that can affect the quality of the RAG chain. For example, determining how many chunks to retrieve in (2) and how to combine them with the user's query in (3) can both significantly impact the model's ability to generate quality responses.
21-
22-
Throughout the chain, various guardrails may be applied to ensure compliance with enterprise policies. This might involve filtering for appropriate requests, checking user permissions before accessing data sources, and applying content moderation techniques to the generated responses.
21+
Throughout the agent, various guardrails may be applied to ensure compliance with enterprise policies. This might involve filtering for appropriate requests, checking user permissions before accessing data sources, and applying content moderation techniques to the generated responses.

genai_cookbook/nbs/2-fundamentals-unstructured-data-pipeline.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
## Data pipeline
22

3-
Throughout this guide we will focus on preparing unstructured data for use in RAG applications. *Unstructured* data refers to data without a specific structure or organization, such as PDF documents that might include text and images, or multimedia content such as audio or videos.
3+
Throughout this guide we will focus on preparing unstructured data for use in agent applications. *Unstructured* data refers to data without a specific structure or organization, such as PDF documents that might include text and images, or multimedia content such as audio or videos.
44

55
Unstructured data lacks a predefined data model or schema, making it impossible to query on the basis of structure and metadata alone. As a result, unstructured data requires techniques that can understand and extract semantic meaning from raw text, images, audio, or other content.
66

7-
During data preparation, the RAG application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)
7+
During data preparation, the agent application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)
88

99
```{image} ../images/2-fundamentals-unstructured/2_img.png
1010
:align: center
@@ -17,7 +17,7 @@ Semantic search is one of several approaches that can be taken when implementing
1717

1818

1919

20-
The following are the typical steps of a data pipeline in a RAG application using unstructured data:
20+
The following are the typical steps of a data pipeline in an agent application using unstructured data:
2121

2222
1. **Parse the raw documents:** The initial step involves transforming raw data into a usable format. This can include extracting text, tables, and images from a collection of PDFs or employing optical character recognition (OCR) techniques to extract text from images.
2323

@@ -31,6 +31,6 @@ The following are the typical steps of a data pipeline in a RAG application usin
3131

3232
The process of computing similarity can be computationally expensive. Vector indexes, such as [Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html), speed this process up by providing a mechanism for efficiently organizing and navigating embeddings, often via sophisticated approximation methods. This enables rapid ranking of the most relevant results without comparing each embedding to the user's query individually.
3333

34-
Each step in the data pipeline involves engineering decisions that impact the RAG application's quality. For example, choosing the right chunk size in step (3) ensures the LLM receives specific yet contextualized information, while selecting an appropriate embedding model in step (4) determines the accuracy of the chunks returned during retrieval.
34+
Each step in the data pipeline involves engineering decisions that impact the agent application's quality. For example, choosing the right chunk size in step (3) ensures the LLM receives specific yet contextualized information, while selecting an appropriate embedding model in step (4) determines the accuracy of the chunks returned during retrieval.
3535

3636
This data preparation process is referred to as *offline* data preparation, as it occurs before the system answers queries, unlike the *online* steps triggered when a user submits a query.

genai_cookbook/nbs/2-fundamentals-unstructured-eval.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
## Evaluation & monitoring
22

3-
Evaluation and monitoring are critical components to understand if your RAG application is performing to the *quality*, *cost*, and *latency* requirements dictated by your use case. Technically, **evaluation** happens during development and **monitoring** happens once the application is deployed to production, but the fundamental components are similar.
3+
Evaluation and monitoring are critical components to understand if your agent application is performing to the *quality*, *cost*, and *latency* requirements dictated by your use case. Technically, **evaluation** happens during development and **monitoring** happens once the application is deployed to production, but the fundamental components are similar.
44

5-
RAG over unstructured data is a complex system with many components that impact the application's quality. Adjusting any single element can have cascading effects on the others. For instance, data formatting changes can influence the retrieved chunks and the LLM's ability to generate relevant responses. Therefore, it's crucial to evaluate each of the application's components in addition to the application as a whole in order to iteratively refine it based on those assessments.
5+
Often, an agent is a complex system with many components that impact the application's quality. Adjusting any single element can have cascading effects on the others. For instance, data formatting changes can influence the retrieved chunks and the LLM's ability to generate relevant responses. Therefore, it's crucial to evaluate each of the application's components in addition to the application as a whole in order to iteratively refine it based on those assessments.
66

7-
Evaluation and monitoring of Generative AI applications, including RAG, differs from classical machine learning in several ways:
7+
Evaluation and monitoring of Generative AI applications, including agents, differs from classical machine learning in several ways:
88

99
| | Classical ML | Generative AI |
1010
|---------|---------|---------|
@@ -19,9 +19,9 @@ Effectively evaluating and monitoring application quality, cost and latency requ
1919
```
2020
<br/>
2121

22-
- **Evaluation set:** To rigorously evaluate your RAG application, you need a curated set of evaluation queries (and ideally outputs) that are representative of the application's intended use. These evaluation examples should be challenging, diverse, and updated to reflect changing usage and requirements.
22+
- **Evaluation set:** To rigorously evaluate your agent application, you need a curated set of evaluation queries (and ideally outputs) that are representative of the application's intended use. These evaluation examples should be challenging, diverse, and updated to reflect changing usage and requirements.
2323

24-
- **Metric definitions**: You can't manage what you don't measure. In order to improve RAG quality, it is essential to define what quality means for your use case. Depending on the application, important metrics might include response accuracy, latency, cost, or ratings from key stakeholders. You'll need metrics that measure each component, how the components interact with each other, and the overall system.
24+
- **Metric definitions**: You can't manage what you don't measure. In order to improve agent quality, it is essential to define what quality means for your use case. Depending on the application, important metrics might include response accuracy, latency, cost, or ratings from key stakeholders. You'll need metrics that measure each component, how the components interact with each other, and the overall system.
2525

2626
- **LLM judges**: Given the open ended nature of LLM responses, it is not feasible to read every single response each time you evaluate to determine if the output is correct. Using an additional, different LLM to review outputs can help scale your evaluation and compute additional metrics such as the groundedness of a response to 1,000s of tokens of context, that would be infeasible for human raters to effectively asses at scale.
2727

0 commit comments

Comments
 (0)