diff --git a/docs/llm/01_cli__click_cli_group_.md b/docs/llm/01_cli__click_cli_group_.md new file mode 100644 index 0000000..febf269 --- /dev/null +++ b/docs/llm/01_cli__click_cli_group_.md @@ -0,0 +1,140 @@ +# Chapter 1: cli (Click CLI group) + +Welcome to the first step in your journey with `llm`! This chapter will introduce you to the `cli`, which is essentially the control panel for the entire `llm` tool. Think of it like the dashboard of a car - it's how you tell the system what to do. + +**Why is a CLI important?** + +Imagine you want to ask a large language model (LLM) a question, like "What are the best types of cheese for a cheese board?". Without a command-line interface (CLI), you'd have to write a whole program to interact with the LLM. The `cli` simplifies this by providing a structured way to send commands and receive answers directly from your terminal. + +**Core Concept: Click CLI Group** + +The `cli` in `llm` is built using a Python library called Click. Click helps create command-line interfaces that are easy to use and understand. The `cli` is organized as a *group* of commands, meaning it can perform various actions, each with its own specific instructions (or "arguments"). + +**Key Concepts to Understand** + +1. **Commands:** These are the specific actions you can perform with `llm`. Examples include `prompt`, `keys`, `logs`, `models`, `templates`, `aliases`, `plugins`, `embed`, `embed_multi`, `similar`, `embed_models`, and `collections`. Each command has a specific purpose. + +2. **Arguments:** These are the pieces of information you provide to a command so it knows *how* to execute. They come in two forms: + * **Positional Arguments:** These are required and their meaning depends on their order. For example, with `llm aliases set `, you *must* provide the alias name first, then the model ID. + * **Options:** These are optional arguments that are specified using flags like `-m` or `--model`. Options usually have a default value if you don't specify them. + +**Let's solve our use case: Asking a question** + +The central use case for `llm` is asking a question to a language model. We'll use the `prompt` command to do this. + +To ask "What are the best types of cheese for a cheese board?" you would type the following into your terminal: + +```bash +llm "What are the best types of cheese for a cheese board?" +``` + +This tells the `llm` tool to use the `prompt` command, with the question itself as the main argument. The `llm` tool will then send this question to the default language model and display the answer. + +Example output: + +``` +Here are some of the best types of cheese for a cheese board: + +* Brie +* Cheddar +* Gouda +* Blue Cheese +* Goat Cheese +``` + +**Adding Options: Using a Specific Model** + +Let's say you want to use a specific model, like `gpt-4o-mini`. You can do this using the `-m` or `--model` option: + +```bash +llm -m gpt-4o-mini "What are the best types of cheese for a cheese board?" +``` + +Here, `-m gpt-4o-mini` tells the `prompt` command to use the `gpt-4o-mini` model instead of the default one. + +**How `cli` works: A High-Level View** + +When you run an `llm` command, here's what happens behind the scenes: + +1. **Parsing Arguments:** The `cli` uses Click to parse the command and its arguments (like the prompt and the model name). +2. **Calling the Correct Function:** Based on the command and arguments, the `cli` calls the appropriate function within the `llm` code. In the above examples, the relevant `prompt` function in `llm/cli.py` will be called. +3. **Displaying Results:** The function interacts with the language model, gets the response, and the `cli` displays that response in your terminal. + +**Diving into the Code (Simplified)** + +Let's look at a simplified version of the `prompt` command definition within `llm/cli.py`: + +```python +@click.command(name="prompt") +@click.argument("prompt", required=False) +@click.option("model_id", "-m", "--model", help="Model to use") +def prompt(prompt, model_id): + """ + Execute a prompt + """ + # Simplified: Get the model + model = get_model(model_id) + + # Simplified: Send the prompt to the model + response = model.prompt(prompt) + + # Simplified: Print the response + print(response.text()) +``` + +Explanation: + +* `@click.command(name="prompt")`: This tells Click that this function is the code to run for the `llm prompt` command. +* `@click.argument("prompt", required=False)`: This defines a *positional* argument called "prompt". The `required=False` means the user doesn't have to provide it (but usually will). +* `@click.option("model_id", "-m", "--model", help="Model to use")`: This defines an *option* called "model". Users can specify the model using `-m` or `--model` flags. The actual name of the variable in code is `model_id`. +* The function body then gets the model, sends the prompt to the model, and prints the response. + +**Internal Implementation Walkthrough** + +Let's visualize how this `prompt` command works internally: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant LLM Core + participant Model + + User->>CLI: llm "Hello, world!" -m gpt-4o-mini + CLI->>LLM Core: Parse command and arguments + CLI->>LLM Core: Call prompt(prompt="Hello, world!", model_id="gpt-4o-mini") + LLM Core->>LLM Core: model = get_model("gpt-4o-mini") + LLM Core->>Model: Initialize gpt-4o-mini model + LLM Core->>Model: prompt("Hello, world!") + Model-->>LLM Core: Response from model + LLM Core-->>CLI: Response text + CLI->>User: Display response in terminal +``` + +This diagram shows: + +1. The user enters the command. +2. The `cli` parses the command and calls the `prompt` function in the `llm` core. +3. The `llm` core figures out the model based on the `-m` option and fetches it, initializing it if needed. +4. The prompt is passed to the model, which generates a response. +5. The response is sent back to the `cli` and displayed to the user. + +**Other Useful Commands** + +The `cli` provides many more commands! Here's a quick overview: + +* `llm keys`: Manages API keys needed to access certain models (like OpenAI). +* `llm models`: Lists the available models. +* `llm templates`: Allows you to create and use pre-defined prompts. See [Template](06_template.md) for details. +* `llm aliases`: Lets you create shorter names for models. + +**Conclusion** + +The `cli` is the foundation for interacting with `llm`. It provides a structured and easy-to-use interface for running prompts, managing models, and much more. As you continue through the tutorial, you'll explore the individual commands and options in greater detail. + +Now that you have a grasp of the `cli`, let's move on to understanding the concept of a [Prompt](02_prompt.md). + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/02_prompt.md b/docs/llm/02_prompt.md new file mode 100644 index 0000000..1679f2c --- /dev/null +++ b/docs/llm/02_prompt.md @@ -0,0 +1,169 @@ +# Chapter 2: Prompt + +In the previous chapter, [cli (Click CLI group)](01_cli__click_cli_group_.md), you learned how to use the command-line interface (`cli`) to interact with `llm`. Now, we'll dive into the heart of that interaction: the `Prompt`. + +Imagine you're at a restaurant. The `cli` is like the waiter, and the `Prompt` is your order. You need to clearly specify what you want so the kitchen (the Large Language Model, or LLM) can prepare the right dish (the response). + +**Why do we need a `Prompt` object?** + +Think about asking an LLM a question. You don't just send the question directly. You might also need to specify which LLM to use, any additional instructions, or even files to reference. The `Prompt` object bundles all of this information together in a structured way. Without it, things could get messy and the LLM might not understand what you're asking! + +**Core Concepts: What's inside a `Prompt`?** + +The `Prompt` object has a few key ingredients: + +1. **`prompt` (The Main Text):** This is the core of your request – the actual question or instruction you want the LLM to process. Like "Summarize this article" or "Translate this sentence into French." + +2. **`model` (The LLM to Use):** This specifies which Large Language Model you want to use. Think of it as choosing a chef at the restaurant. Different LLMs (like `gpt-4o-mini` or `llama-3`) have different strengths. + +3. **`attachments` (Additional Files):** Sometimes you want the LLM to consider extra information, like a document or an image. These are called attachments. Imagine showing the chef a picture of the dish you want! + +4. **`system` (System Instructions):** This provides high-level instructions to guide the LLM's behavior. Think of it as telling the chef your dietary restrictions or preferred cooking style. For example, "You are a helpful AI assistant." + +5. **`options` (Configuration Settings):** These are extra settings that control *how* the LLM generates its response, like the maximum length of the response or how "creative" it should be. These are covered in detail in the [Options](05_options.md) chapter. + +**Solving the Use Case: Asking a Question with a Specific Model** + +Let's revisit our cheese board question from the last chapter. This time, we'll look at how the `Prompt` object is constructed behind the scenes when you use the `cli`. + +When you run the command: + +```bash +llm -m gpt-4o-mini "What are the best types of cheese for a cheese board?" +``` + +The `cli` creates a `Prompt` object that looks something like this (in Python-like terms): + +```python +prompt_object = Prompt( + prompt="What are the best types of cheese for a cheese board?", + model=gpt_4o_mini_model, # Assume we have access to a gpt-4o-mini model object + attachments=[], + system=None, + prompt_json=None, + options={} +) +``` + +Explanation: + +* `prompt="What are the best types of cheese for a cheese board?"`: This sets the main question. +* `model=gpt_4o_mini_model`: This specifies that we want to use the `gpt-4o-mini` model. The `cli` figures out which model you meant based on the `-m gpt-4o-mini` option. +* `attachments=[], system=None, prompt_json=None, options={}`: We're not using any attachments, system instructions, or special options in this case. + +The `cli` then passes this `prompt_object` to the `Model` object (we'll learn more about [Model](03_model.md) in the next chapter) to generate a response. + +**Adding System Instructions** + +Let's say we want to tell the LLM to respond in a funny way. We can use the `--system` option (via the `cli` or when constructing the `Prompt` object directly if writing Python code): + +```bash +llm --system "Respond in the style of a pirate." "What are the best types of cheese for a cheese board?" +``` + +Now the `Prompt` object would look like this: + +```python +prompt_object = Prompt( + prompt="What are the best types of cheese for a cheese board?", + model=default_model, # Assuming the default model is being used + attachments=[], + system="Respond in the style of a pirate.", + prompt_json=None, + options={} +) +``` + +The LLM should now respond in a pirate-like tone! + +Example output: + +``` +Ahoy, matey! For a cheese board fit for a pirate, ye'll be wantin' these: + +* Cheddar, aged like buried treasure! +* Brie, smooth as calm waters. +* Gouda, round like a doubloon! +``` + +**Internal Implementation Walkthrough** + +Let's see what happens under the hood when the `cli` uses the `Prompt` object: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant LLM Core + participant Model + + User->>CLI: llm "Tell me a joke." -m gpt-4o-mini + CLI->>LLM Core: Parse command, create Prompt object + LLM Core->>Model: model.prompt(prompt_object) + Model->>Model: Execute prompt using LLM API + Model-->>LLM Core: Response from LLM + LLM Core-->>CLI: Response text + CLI->>User: Display response in terminal +``` + +Here's the breakdown: + +1. The user enters a command with a prompt and, optionally, a model choice. +2. The `cli` parses the command and creates a `Prompt` object, packaging the prompt text, model ID, and any other options. +3. The `cli` calls the `prompt` method on the selected [Model](03_model.md) (like `gpt-4o-mini`). +4. The `Model` object then interacts with the actual LLM API to get a response based on the information provided in the `Prompt` object. +5. Finally, the LLM's response is displayed to the user. + +**Code Example (Simplified)** + +Here's a simplified view of how the `Prompt` object is used within the `llm` code (referencing the `llm/models.py` file): + +```python +@dataclass +class Prompt: + prompt: str + model: "Model" # type: ignore + attachments: Optional[List[Attachment]] + system: Optional[str] + options: "Options" # type: ignore + +# ... later in the code ... + +class Model(ABC): + # ... + def prompt( + self, + prompt: str, + *, + attachments: Optional[List[Attachment]] = None, + system: Optional[str] = None, + stream: bool = True, + **options + ): + return self.response( + Prompt( + prompt, + attachments=attachments, + system=system, + model=self, + options=self.Options(**options), + ), + stream=stream, + ) +``` + +Explanation: + +* The `Prompt` class (using `@dataclass`) clearly defines the structure of a prompt object, including the prompt text, model, attachments, system instructions, and options. The `type: ignore` comments are for static type checking and can be ignored for understanding the core concepts. +* The `Model.prompt` method creates a `Prompt` object using the provided arguments and then calls the `response` method (returning a [Response](04_response.md) object, which we'll explore in a future chapter). This illustrates how the `Prompt` object is constructed and then passed to the model for processing. + +**Conclusion** + +The `Prompt` object is a container that holds all the information needed to get a response from a Large Language Model. It ensures that your requests are structured and clear, allowing you to get the most out of `llm`. + +In the next chapter, we'll explore the [Model](03_model.md) object, which is responsible for actually interacting with the LLM and generating the response based on the `Prompt`. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/03_model.md b/docs/llm/03_model.md new file mode 100644 index 0000000..9b70be4 --- /dev/null +++ b/docs/llm/03_model.md @@ -0,0 +1,193 @@ +# Chapter 3: Model + +In the previous chapter, [Prompt](02_prompt.md), you learned how to create `Prompt` objects to package up your requests to a Large Language Model. But how does that `Prompt` actually *get* to the LLM and get a response back? That's where the `Model` comes in. + +Think of a `Model` as a chef in a restaurant. Each chef (model) has their own recipes (trained parameters) and can prepare different dishes (generate text, answer questions). A `Model` handles authentication (making sure you have permission to order food!), prompt formatting (preparing the ingredients), and the actual API call to the underlying LLM (sending the order to the kitchen and getting the dish back). + +**Why do we need a `Model` object?** + +Imagine you want to use different LLMs, like `gpt-4o-mini` or `llama-3`. Each LLM has its own unique API, its own way of formatting prompts, and its own way of handling authentication. A `Model` object hides all of that complexity from you. You just tell the `Model` to execute a `Prompt`, and it takes care of the rest. + +**Core Concepts: What's inside a `Model`?** + +The `Model` object is responsible for a few key things: + +1. **`model_id`:** This is a unique identifier for the model, like `"gpt-4o-mini"` or `"llama-3"`. Think of it as the chef's name. + +2. **Authentication:** Many LLMs require an API key to use them. The `Model` handles retrieving and using this key. Think of this as paying for the meal! + +3. **Prompt Formatting:** LLMs often require prompts to be formatted in a specific way. The `Model` knows how to do this formatting. This is like the chef knowing how to prepare the ingredients correctly. + +4. **API Call:** The `Model` makes the actual API call to the LLM, sends the formatted prompt, and receives the response. This is like the waiter bringing your order to the kitchen and bringing the finished dish back to your table. + +5. **Options:** The `Model` stores any specific configuration options available for the particular LLM it represents. These are specific settings that will be sent along the API call. + +**Solving the Use Case: Asking a Question using a Specific Model** + +Let's say you want to ask `gpt-4o-mini` a question. You would use the `cli` like this: + +```bash +llm -m gpt-4o-mini "What are the best types of cheese for a cheese board?" +``` + +Behind the scenes, the `cli` does the following: + +1. It figures out that you want to use the `gpt-4o-mini` model (from the `-m gpt-4o-mini` option). + +2. It creates a `Model` object representing the `gpt-4o-mini` LLM. This `Model` object knows how to authenticate with the OpenAI API (if necessary), format the prompt for `gpt-4o-mini`, and make the API call. + +3. It creates a [Prompt](02_prompt.md) object containing your question. + +4. It tells the `Model` object to execute the `Prompt`. + +5. The `Model` object sends the formatted prompt to the OpenAI API. + +6. The OpenAI API responds with the answer. + +7. The `Model` object returns the answer to the `cli`, which then displays it to you. + +**Example Code (Simplified)** + +Here's a simplified example of how you might create and use a `Model` object in Python (this is a *simplified* example to illustrate the concept): + +```python +# Assume we have a Model class defined (see llm/models.py for the actual class) +class Model: + def __init__(self, model_id): + self.model_id = model_id + + def prompt(self, prompt_text): + # In a real implementation, this would: + # 1. Authenticate with the LLM API + # 2. Format the prompt correctly + # 3. Make the API call + # 4. Return the response + print(f"Sending prompt '{prompt_text}' to model {self.model_id}...") + response = f"The best cheeses are Brie, Cheddar, and Gouda. - Model: {self.model_id}" # dummy + return response + +# Create a Model object for gpt-4o-mini +model = Model("gpt-4o-mini") + +# Ask the model a question +question = "What are the best types of cheese for a cheese board?" +answer = model.prompt(question) + +# Print the answer +print(answer) +``` + +Explanation: + +* The `Model` class is a simplified representation of a real `Model` object. It has a `model_id` and a `prompt` method. +* We create a `Model` object for `gpt-4o-mini`. +* We call the `prompt` method to ask the model a question. +* The `prompt` method (in a real implementation) would handle all the details of interacting with the LLM API. + +Example output: + +``` +Sending prompt 'What are the best types of cheese for a cheese board?' to model gpt-4o-mini... +The best cheeses are Brie, Cheddar, and Gouda. - Model: gpt-4o-mini +``` + +**Internal Implementation Walkthrough** + +Let's visualize what happens internally when you call the `prompt` method on a `Model` object: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant LLM Core + participant Model + participant LLM API + + User->>CLI: llm -m gpt-4o-mini "Hello, world!" + CLI->>LLM Core: Create Model("gpt-4o-mini") and Prompt("Hello, world!") + LLM Core->>Model: prompt(Prompt) + Model->>Model: Format prompt for LLM API + Model->>LLM API: Send formatted prompt + LLM API-->>Model: Response from LLM + Model-->>LLM Core: Response text + LLM Core-->>CLI: Response text + CLI->>User: Display response in terminal +``` + +This diagram shows: + +1. The user enters a command with a prompt and a model choice. +2. The `cli` creates a `Model` object and a [Prompt](02_prompt.md) object. +3. The `cli` calls the `prompt` method on the `Model` object, passing in the `Prompt` object. +4. The `Model` object formats the prompt to be compatible with the underlying LLM API. +5. The `Model` object sends the formatted prompt to the LLM API. +6. The LLM API processes the prompt and returns a response. +7. The `Model` object receives the response and returns it to the `cli`. +8. The `cli` displays the response to the user. + +**Diving into the Code** + +Here's a look at a simplified version of the `Model` class definition within `llm/models.py`: + +```python +from abc import ABC, abstractmethod +from dataclasses import dataclass + +@dataclass +class Prompt: # Simplified for brevity + prompt: str + +class Model(ABC): + model_id: str + + @abstractmethod + def execute(self, prompt: Prompt) -> str: + """Execute a prompt and return the response text.""" + pass + + def prompt(self, prompt_text: str) -> str: + prompt = Prompt(prompt_text) + return self.execute(prompt) +``` + +Explanation: + +* The `Model` class is an abstract base class (`ABC`), which means that concrete model implementations (like `gpt-4o-mini`) must inherit from it and implement the `execute` method. +* The `execute` method is an abstract method (`@abstractmethod`), which means that it must be implemented by subclasses. This method is responsible for actually interacting with the LLM API. +* The `prompt` method is a convenience method that creates a [Prompt](02_prompt.md) object and then calls the `execute` method. + +And here's an example of a concrete `Model` implementation (simplified): + +```python +class MyModel(Model): + def __init__(self, model_id: str): + self.model_id = model_id + + def execute(self, prompt: Prompt) -> str: + # Pretend we're calling an API here + return f"Response from {self.model_id}: {prompt.prompt}" +``` + +Explanation: + +* The `MyModel` class inherits from the `Model` class and implements the `execute` method. +* The `execute` method simply returns a string indicating that it received the prompt. In a real implementation, this method would make an API call to an LLM. + +**Other Useful Methods** + +The `Model` class may also have other useful methods, such as: + +* `get_key()`: Retrieves the API key for the LLM. +* `format_prompt()`: Formats the prompt to be compatible with the LLM API. +* `stream()`: Returns a generator that yields chunks of text as they are generated by the LLM (for streaming responses). + +**Conclusion** + +The `Model` object is a key abstraction in `llm` that handles the complexities of interacting with different Large Language Models. It encapsulates authentication, prompt formatting, and API calls, allowing you to easily switch between different LLMs without having to worry about the details. + +In the next chapter, we'll explore the [Response](04_response.md) object, which represents the result of executing a prompt against a `Model`. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/04_response.md b/docs/llm/04_response.md new file mode 100644 index 0000000..58a063f --- /dev/null +++ b/docs/llm/04_response.md @@ -0,0 +1,189 @@ +# Chapter 4: Response + +In the previous chapter, [Model](03_model.md), you learned how `Model` objects are like chefs, taking your `Prompt` (order) and using an LLM to generate an answer. Now, let's talk about what you get back: the `Response`. + +Think of the `Response` as the finished dish the chef prepares for you. It contains the generated text, any extra information about how it was created, and handles real-time delivery of results if you're watching the chef cook! + +**Why do we need a `Response` object?** + +Imagine just getting the raw text back from the LLM. It would be like the waiter just shouting the ingredients at you! The `Response` object packages everything up nicely. It provides the generated text, but also information like how long it took to generate, any errors that occurred, and more. + +**Core Concepts: What's inside a `Response`?** + +The `Response` object contains a few key things: + +1. **`text`:** This is the most important part - the actual text generated by the LLM! This is like the finished dish itself. + +2. **`model`:** This tells you which [Model](03_model.md) (chef) was used to generate the response. + +3. **`prompt`:** This is the original [Prompt](02_prompt.md) (order) that was used to generate the response. + +4. **`duration_ms`:** This tells you how long it took to generate the response, in milliseconds. + +5. **`datetime_utc`:** This is the date and time (in UTC) when the response was generated. + +6. **`stream`:** This indicates if the response was delivered in chunks (streaming) or all at once. Think of it as whether you watch the chef cook or they just bring you the finished dish. + +7. **`json`:** Some models return responses as structured JSON data. This field holds that data. + +**Solving the Use Case: Getting an Answer and its Metadata** + +Let's go back to our cheese board question. When you run: + +```bash +llm "What are the best types of cheese for a cheese board?" +``` + +The `llm` tool doesn't just print the answer. It creates a `Response` object first. You can access different parts of this `Response` object (using Python code) like this: + +```python +# Assume we have a response object called 'response' +# obtained from the Model.prompt() method as seen in the last chapter +print(response.text()) # Prints the generated text +print(response.model) # Prints the model that was used +print(response.prompt) # Prints the original prompt +print(response.duration_ms()) # Prints the duration in milliseconds +print(response.datetime_utc()) # Prints the timestamp + +``` + +Example output (will vary depending on the model and execution time): + +``` +Here are some good cheeses for a cheeseboard: Brie, Cheddar, and Gouda. + +Prompt(prompt='What are the best types of cheese for a cheese board?', model=, attachments=[], system=None, prompt_json=None, options={}) +1234 +2024-10-27T10:00:00 +``` + +Explanation: + +* `response.text()`: This gets the actual answer from the LLM (the list of cheeses). +* `response.model`: This tells you which model provided the answer (e.g., `gpt-4o-mini`). +* `response.prompt`: This shows the exact question you asked. +* `response.duration_ms()`: This tells you how long it took to get the answer. +* `response.datetime_utc()`: This tells you when the answer was generated. + +**Handling Streaming Responses** + +Some LLMs can send responses in chunks, like watching a chef gradually assemble the dish. The `Response` object handles this streaming behavior. Here's how you can iterate through the chunks (using Python code): + +```python +# Assume we have a streaming response object called 'response' +for chunk in response: + print(chunk, end="") # Print each chunk as it arrives + +``` + +Explanation: + +* The `for chunk in response:` loop iterates through each chunk of text as it's generated by the LLM. +* `print(chunk, end="")` prints each chunk to the console without adding a newline character, so the output appears as a continuous stream of text. + +**Internal Implementation Walkthrough** + +Here's what happens internally when you get a `Response` object: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant LLM Core + participant Model + participant LLM API + + User->>CLI: llm "Tell me a joke" + CLI->>LLM Core: Create Prompt and Model + LLM Core->>Model: Execute Prompt + Model->>LLM API: Send Prompt to LLM + LLM API-->>Model: Response (chunks or complete) + Model-->>LLM Core: Response object + LLM Core-->>CLI: Display Response text to user +``` + +This diagram shows: + +1. The user enters a command at the `cli`. +2. The `cli` creates a [Prompt](02_prompt.md) object and a [Model](03_model.md) object. +3. The `cli` tells the `Model` to execute the `Prompt`. +4. The `Model` sends the `Prompt` to the LLM API. +5. The LLM API sends back the response, either in chunks or all at once. +6. The `Model` encapsulates this into a `Response` object and returns it to the `cli`. +7. The `cli` receives the `Response` object and displays the text to the user. + +**Code Example (Simplified)** + +Here's a simplified example of how the `Response` object is used within the `llm` code (referencing the `llm/models.py` file): + +```python +from dataclasses import dataclass +import datetime +import time +from typing import List, Iterator, Optional + +@dataclass +class Prompt: # Simplified for brevity + prompt: str + +class Model: # Simplified for brevity + def __init__(self, model_id: str): + self.model_id = model_id + + def execute(self, prompt: Prompt, stream: bool) -> Iterator[str]: + #Simulate sending a prompt to an LLM and getting chunks back + chunks = ["This ", "is ", "a ", "simulated ", "response."] + if stream: + yield from chunks # Yield chunks one by one + else: + yield "".join(chunks) # Yield the entire response at once + +class Response: + def __init__(self, prompt: Prompt, model: Model, stream: bool): + self.prompt = prompt + self.model = model + self.stream = stream + self._chunks: List[str] = [] #Accumulate chunks here + self._done = False + self._start_time = time.time() # Capture start time + + def __iter__(self) -> Iterator[str]: + # Send prompt to model and collect the response + for chunk in self.model.execute(self.prompt, self.stream): + self._chunks.append(chunk) + yield chunk # Yield the chunk + + self._done = True #Mark as complete + + def text(self) -> str: + # Return assembled text from chunks + return "".join(self._chunks) + +# Create instances and demonstrate + +prompt = Prompt("Tell me a short story.") +model = Model("TestModel") +response = Response(prompt, model, stream = True) # Get a streaming response + +for chunk in response: # Process the streaming chunks + print(chunk, end="") + +print(f"\nResponse from {response.model.model_id} in {(time.time() - response._start_time):.4f} seconds.") +``` + +Explanation: + +* This simplified code demonstrates how a `Response` object accumulates chunks from a `Model` and provides access to the full text once the response is complete. +* The `__iter__` method simulates receiving chunks from a model's execution and yields them, allowing for streaming. +* The `text` method assembles the complete text from the accumulated chunks. + +**Conclusion** + +The `Response` object provides a structured way to access the output of an LLM, along with important metadata about how the response was generated. It handles streaming responses, allowing you to display results in real-time. + +In the next chapter, we'll explore [Options](05_options.md), which let you customize how the LLM generates its response. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/05_options.md b/docs/llm/05_options.md new file mode 100644 index 0000000..3d4553a --- /dev/null +++ b/docs/llm/05_options.md @@ -0,0 +1,175 @@ +# Chapter 5: Options + +In the previous chapter, [Response](04_response.md), you learned how to get the results from your interaction with an LLM. But what if you want to influence *how* the LLM generates its response? That's where `Options` come in! + +Think of `Options` like adjusting the ingredients or cooking time in a recipe. You can tweak them to get a slightly different result. For example, you might want the LLM to be more creative, or you might want to limit the length of its response. + +**Why do we need `Options`?** + +Imagine you're asking an LLM to write a short story. You might want to specify that it should be a scary story (influencing the *content*) or that it should be no more than 100 words long (influencing the *length*). `Options` provide a structured way to specify these kinds of preferences. Without them, you're stuck with the LLM's default behavior! + +**Core Concepts: What are `Options`?** + +`Options` are settings that you can use to influence how an LLM responds to your prompt. They are defined as a Pydantic model, which is like a blueprint for creating structured data in Python. This means that `Options` have specific types (like numbers or strings) and can be validated to ensure that they are valid. + +Here are some common types of `Options`: + +1. **`temperature`:** This controls the randomness of the LLM's output. A higher temperature (e.g., 1.0) makes the output more creative and unpredictable, while a lower temperature (e.g., 0.2) makes it more focused and deterministic. + +2. **`max_tokens`:** This limits the maximum number of tokens (words or sub-words) in the LLM's response. This is useful for controlling the length of the output. + +3. **`top_p`:** This is another way to control the randomness of the output. It's similar to temperature, but it works by selecting from the most probable tokens. + +4. **`frequency_penalty`:** This penalizes the LLM for repeating words or phrases, encouraging it to use more diverse language. + +5. **`presence_penalty`:** This penalizes the LLM for introducing new topics, encouraging it to stay focused on the current topic. + +**Solving the Use Case: Asking a Question with a Specific Temperature** + +Let's say you want to ask `gpt-4o-mini` a question and make the response more creative. You can use the `--temperature` option with the `cli`: + +```bash +llm -m gpt-4o-mini --temperature 0.7 "Write a poem about a cat." +``` + +Here, `--temperature 0.7` tells the `llm` tool to use a temperature of 0.7, which will make the poem more creative and less predictable than the default temperature. + +Behind the scenes, the `cli` does the following: + +1. It parses the command-line arguments, including the `--temperature 0.7` option. +2. It creates a [Prompt](02_prompt.md) object containing the prompt text and the temperature option. +3. It creates a [Model](03_model.md) object representing the `gpt-4o-mini` LLM. +4. It tells the `Model` object to execute the `Prompt`, passing in the `Options` (including the temperature) to the LLM API. +5. The LLM API generates a response based on the prompt and the options. +6. The `Model` object returns the response to the `cli`, which then displays it to you. + +**Example Code (Simplified)** + +Here's a simplified example of how you might use `Options` in Python (this is a simplified example to illustrate the concept): + +```python +from pydantic import BaseModel + +# Define a custom Options class +class MyOptions(BaseModel): + temperature: float = 0.5 + max_tokens: int = 100 + +# Assume we have a Model class defined (see llm/models.py for the actual class) +class Model: + def __init__(self, model_id): + self.model_id = model_id + + def prompt(self, prompt_text, options: MyOptions): # Added options! + print(f"Sending prompt '{prompt_text}' to model {self.model_id} with options {options}...") + response = f"The response based on the prompt. - Model: {self.model_id}, Temperature: {options.temperature}" # dummy + return response + +# Create a Model object for gpt-4o-mini +model = Model("gpt-4o-mini") + +# Create an Options object +options = MyOptions(temperature=0.7, max_tokens=150) + +# Ask the model a question with options +question = "Write a short story about a dog." +answer = model.prompt(question, options) + +# Print the answer +print(answer) +``` + +Explanation: + +* We define a custom `Options` class called `MyOptions` using Pydantic's `BaseModel`. This class defines the available options and their default values. +* The `Model` class now accepts an `options` argument in its `prompt` method. +* We create an `Options` object with specific values for `temperature` and `max_tokens`. +* We call the `prompt` method, passing in the `Options` object. + +Example output: + +``` +Sending prompt 'Write a short story about a dog.' to model gpt-4o-mini with options temperature=0.7 max_tokens=150... +The response based on the prompt. - Model: gpt-4o-mini, Temperature: 0.7 +``` + +**Internal Implementation Walkthrough** + +Let's visualize what happens internally when you call the `prompt` method on a `Model` object with `Options`: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant LLM Core + participant Model + participant LLM API + + User->>CLI: llm -m gpt-4o-mini "Hello" --temperature 0.7 + CLI->>LLM Core: Create Model("gpt-4o-mini") and Prompt("Hello", Options) + LLM Core->>Model: prompt(Prompt) + Model->>Model: Format prompt and options for LLM API + Model->>LLM API: Send formatted prompt + options + LLM API-->>Model: Response from LLM + Model-->>LLM Core: Response text + LLM Core-->>CLI: Response text + CLI->>User: Display response in terminal +``` + +This diagram shows: + +1. The user enters a command with a prompt, a model choice, and some `Options`. +2. The `cli` creates a [Model](03_model.md) object and a [Prompt](02_prompt.md) object, including the `Options`. +3. The `cli` calls the `prompt` method on the `Model` object, passing in the `Prompt` object. +4. The `Model` object formats the prompt and the `Options` to be compatible with the underlying LLM API. +5. The `Model` object sends the formatted prompt and `Options` to the LLM API. +6. The LLM API processes the prompt and `Options` and returns a response. +7. The `Model` object receives the response and returns it to the `cli`. +8. The `cli` displays the response to the user. + +**Diving into the Code** + +Here's a look at a simplified version of how `Options` are defined and used within the `llm/models.py` file: + +```python +from pydantic import BaseModel +from dataclasses import dataclass + +class Options(BaseModel): + # Note: using pydantic v1 style Configs, + # these are also compatible with pydantic v2 + class Config: + extra = "forbid" + + temperature: float = 0.5 + max_tokens: int = 200 + +@dataclass +class Prompt: + prompt: str + model: "Model" + options: Options + +class Model: + def execute(self, prompt: Prompt) -> str: + # Simulate using the prompt and options + return f"Executed with prompt: {prompt.prompt} and temperature: {prompt.options.temperature}" +``` + +Explanation: + +* The `Options` class is defined using Pydantic's `BaseModel`. This allows us to easily define the available options, their types, and their default values. +* The `Prompt` class includes an `options` field, which is an instance of the `Options` class. +* The `Model.execute` method receives the `Prompt` object, which includes the `Options`. The method can then use these options to influence the LLM's behavior (in this example, it just prints them). +* The `Options.Config` with `extra = "forbid"` is a feature that prevents the model from having arbitrary attributes assigned to it, improving the structure and limiting potential errors. + +**Conclusion** + +`Options` provide a powerful way to customize the behavior of Large Language Models. By adjusting settings like `temperature` and `max_tokens`, you can influence the creativity, length, and focus of the LLM's responses. + +In the next chapter, we'll explore [Template](06_template.md), which allows you to create reusable prompts and further customize your interactions with LLMs. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/06_template.md b/docs/llm/06_template.md new file mode 100644 index 0000000..fe518b2 --- /dev/null +++ b/docs/llm/06_template.md @@ -0,0 +1,186 @@ +# Chapter 6: Template + +In the previous chapter, [Options](05_options.md), you learned how to customize the behavior of LLMs by tweaking settings like temperature and max tokens. Now, let's see how we can *reuse* prompts and predefine common configurations with `Template`s! + +Imagine you often ask the LLM to summarize articles in a concise way. Instead of typing the same instructions and options every time, you can create a `Template` – like saving your favorite recipe – and use it with a single command. + +**Why do we need `Template`s?** + +Think about asking a language model to translate text into different languages. You might have a specific format you prefer, like always including a disclaimer. A `Template` lets you define this format once and reuse it, saving you time and effort, and ensuring consistency! + +**Core Concepts: What is a `Template`?** + +A `Template` is essentially a recipe for creating prompts. It contains: + +1. **`name`:** A unique identifier for the template (e.g., "summarize_concise"). + +2. **`prompt`:** The main prompt text, which can include placeholders (variables) for dynamic content. For instance: `"Summarize this: $input"`. + +3. **`system`:** Optional system instructions, like `"You are a helpful summarization assistant."`. + +4. **`model`:** The default model to use with this template (e.g., `"gpt-4o-mini"`). + +5. **`defaults`:** Default values for the variables in the prompt. For example, if the prompt is `"Translate $input to $language"`, you might set a default `language` to `"French"`. + +**Solving the Use Case: Creating and Using a Summarization Template** + +Let's create a `Template` that summarizes text concisely. + +First, we need to define the template (this can be done programmatically or via a command-line interface, which we'll cover later). For now, let's imagine we have a Python object representing our template: + +```python +from llm import Template + +summarize_template = Template( + name="summarize_concise", + prompt="Summarize this in a few sentences: $input", + system="You are a helpful summarization assistant.", + model="gpt-4o-mini", + defaults={} # No default values are needed for this template +) +``` + +Explanation: + +* We create a `Template` object named `"summarize_concise"`. +* The `prompt` includes a placeholder `$input`, which will be replaced with the text we want to summarize. +* We provide system instructions to guide the LLM. +* We specify the default model to use: `"gpt-4o-mini"`. + +Now, let's use this template to summarize a piece of text: + +```python +text_to_summarize = "The quick brown fox jumps over the lazy dog. This is a classic pangram." + +# In the actual llm library, you would use the template with the cli or through code. +# This is a simplified demonstration. + +prompt, system = summarize_template.evaluate(text_to_summarize) + +print("Prompt:", prompt) +print("System:", system) +``` + +Example output: + +``` +Prompt: Summarize this in a few sentences: The quick brown fox jumps over the lazy dog. This is a classic pangram. +System: You are a helpful summarization assistant. +``` + +As you can see, the `$input` placeholder in the template was replaced with the text we provided. The `evaluate` method returns the combined prompt and system instructions, ready to be sent to a [Model](03_model.md). + +**Adding Default Values** + +Let's create another template for translation, this time with a default language: + +```python +from llm import Template + +translate_template = Template( + name="translate_french", + prompt="Translate this to $language: $input", + system="You are a translation expert.", + model="gpt-4o-mini", + defaults={"language": "French"} +) + +text_to_translate = "Hello, world!" +prompt, system = translate_template.evaluate(text_to_translate) + +print("Prompt:", prompt) +print("System:", system) +``` + +Example output: + +``` +Prompt: Translate this to French: Hello, world! +System: You are a translation expert. +``` + +Here, the `$language` placeholder was automatically replaced with `"French"` because we defined it as a default value in the template. + +**Internal Implementation Walkthrough** + +Let's see what happens under the hood when we use a `Template`: + +```mermaid +sequenceDiagram + participant User + participant Template + participant LLM Core + + User->>Template: template.evaluate("Some text") + Template->>Template: Interpolate prompt with provided input and defaults + Template->>LLM Core: Return complete prompt and system instructions + LLM Core->>User: Complete Prompt and System instructions +``` + +This diagram shows: + +1. The user calls the `evaluate` method on a `Template` object, providing some input text. +2. The `Template` interpolates the prompt and system instructions with the provided input and any default values. +3. The `Template` returns the complete prompt and system instructions to the `llm` core. + +**Diving into the Code** + +Here's a look at the `Template` class definition within `llm/templates.py`: + +```python +from pydantic import BaseModel +import string +from typing import Optional, Any, Dict, List, Tuple + + +class Template(BaseModel): + name: str + prompt: Optional[str] = None + system: Optional[str] = None + model: Optional[str] = None + defaults: Optional[Dict[str, Any]] = None + + def evaluate( + self, input: str, params: Optional[Dict[str, Any]] = None + ) -> Tuple[Optional[str], Optional[str]]: + params = params or {} + params["input"] = input + if self.defaults: + for k, v in self.defaults.items(): + if k not in params: + params[k] = v + prompt: Optional[str] = None + system: Optional[str] = None + if not self.prompt: + system = self.interpolate(self.system, params) + prompt = input + else: + prompt = self.interpolate(self.prompt, params) + system = self.interpolate(self.system, params) + return prompt, system + + @classmethod + def interpolate(cls, text: Optional[str], params: Dict[str, Any]) -> Optional[str]: + if not text: + return text + string_template = string.Template(text) + return string_template.substitute(**params) +``` + +Explanation: + +* The `Template` class uses Pydantic to define its structure, including the `name`, `prompt`, `system`, `model`, and `defaults` fields. +* The `evaluate` method takes an `input` string and an optional dictionary of `params`. +* It merges the provided `params` with the `defaults` defined in the template. +* It uses the `interpolate` method (using Python's built-in `string.Template`) to replace the placeholders in the prompt and system instructions with the values from the `params` dictionary. + +**Conclusion** + +`Template`s provide a powerful way to reuse prompts and predefine common configurations. By using templates, you can save time, ensure consistency, and easily customize your interactions with Large Language Models. + +In the next chapter, we'll explore the [PluginManager (pm)](07_pluginmanager__pm_.md), which allows you to extend the functionality of `llm` with custom models, templates, and other features. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/07_pluginmanager__pm_.md b/docs/llm/07_pluginmanager__pm_.md new file mode 100644 index 0000000..3e3224e --- /dev/null +++ b/docs/llm/07_pluginmanager__pm_.md @@ -0,0 +1,175 @@ +# Chapter 7: PluginManager (pm) + +In the previous chapter, [Template](06_template.md), you learned how to create reusable prompts to streamline your interaction with LLMs. But what if you want to add *new* models or commands to `llm` without changing the core code? That's where the `PluginManager` (often called `pm`) comes in! + +Think of the `PluginManager` as a modular kitchen system. You can add new appliances (like a fancy coffee maker or a powerful blender) to your kitchen without having to rebuild the entire thing. Similarly, the `PluginManager` allows external packages to register new functionalities to `llm`, making it incredibly extensible. + +**Why do we need a `PluginManager`?** + +Imagine you've found a fantastic new LLM, but `llm` doesn't support it natively. Without a `PluginManager`, you'd have to modify the core `llm` code to add support for this new model. This can be complicated, and your changes might be overwritten when you update `llm`. + +With the `PluginManager`, you can create a separate *plugin* that adds support for the new model. This plugin can be installed and uninstalled without affecting the core `llm` code. This makes `llm` much more flexible and adaptable to new technologies. + +**Core Concepts: How does the `PluginManager` work?** + +The `PluginManager` has a few key concepts that you should understand: + +1. **Plugins:** These are external packages that add new functionalities to `llm`. A plugin might add a new model, a new command, or even a new way to format prompts. + +2. **Entry Points:** Plugins use entry points to tell `llm` about the functionalities they provide. Think of an entry point as a signpost that points to the code that should be executed when a particular action is performed. + +3. **Hooks:** These are specific points in the `llm` code where plugins can "hook in" and add their own functionalities. For example, there might be a hook for registering new models, or a hook for adding new commands. + +4. **Registration:** The `PluginManager` is responsible for discovering and registering plugins. It looks for plugins that have defined entry points and then registers them with the system. + +**Solving the Use Case: Adding a New Model via a Plugin** + +Let's imagine you want to add support for a hypothetical new model called "AwesomeModel". You would do this by creating a plugin. The following steps outlines the process: + +1. **Create a Plugin Package:** Create a new Python package (directory with an `__init__.py` file) for your plugin. Let's call it `llm_awesome_model`. + +2. **Define the Model:** In your plugin package, define a class that represents the "AwesomeModel" model. This class should inherit from the `llm.Model` class and implement the required methods (like `prompt`). + +3. **Register the Model using a Hook:** Use the `llm.hookimpl` decorator (provided by `llm`) to register your model with the `PluginManager`. + +4. **Configure Entry Points:** In your plugin's `setup.py` or `pyproject.toml` file, define an entry point that tells `llm` about your model. + +Let's look at some simplified code snippets. First, inside the `llm_awesome_model` package (e.g. in a file called `__init__.py`): + +```python +from llm import Model, hookimpl + +class AwesomeModel(Model): + model_id = "awesome-model" + + def __init__(self): + super().__init__(self.model_id) + + def execute(self, prompt): + # Implementation to call the AwesomeModel API would go here + return f"AwesomeModel says: {prompt.prompt}" + +@hookimpl +def register_models(register): + register(AwesomeModel()) +``` + +Explanation: + +* We create a class called `AwesomeModel` that inherits from `llm.Model`. We set `model_id` to `"awesome-model"`. The `execute` method is a placeholder, simulating a response. +* The `@hookimpl` decorator tells `llm` that this function should be called when `llm` is looking for models to register. The `register` argument is a function that we can call to register our model. + +Next, in your plugin's `setup.py` or `pyproject.toml` file, you'll add the entry point. Here's an example of a `setup.py`: + +```python +from setuptools import setup + +setup( + name="llm_awesome_model", + version="0.1.0", + py_modules=["llm_awesome_model"], # Replace with actual module structure + entry_points={ + "llm": ["llm_awesome_model = llm_awesome_model"] + }, +) +``` + +Explanation: + +* The `entry_points` section tells `llm` that this package provides a plugin for `llm`. +* `"llm_awesome_model = llm_awesome_model"` maps the entry point name `"llm_awesome_model"` to the Python module `llm_awesome_model`. `llm` will import this module. + +After installing this plugin, you should be able to use the `awesome-model` model with `llm`: + +```bash +llm -m awesome-model "Tell me something awesome!" +``` + +Example output: + +``` +AwesomeModel says: Tell me something awesome! +``` + +**Internal Implementation Walkthrough** + +Here's what happens internally when `llm` uses the `PluginManager` to load plugins: + +```mermaid +sequenceDiagram + participant LLM Core + participant PluginManager as PM + participant Plugin + participant Setuptools + + LLM Core->>PM: Initialize PluginManager + PM->>Setuptools: Find entry points for group 'llm' + Setuptools-->>PM: Return list of entry points + loop For each entry point + PM->>Plugin: Load entry point module + Plugin-->>PM: Module code + PM->>Plugin: Register hooks from module + end + PM->>LLM Core: Plugin registration complete +``` + +This diagram shows: + +1. The `llm` core initializes the `PluginManager`. +2. The `PluginManager` asks `setuptools` (a Python library for packaging) to find all entry points with the group name "llm". +3. `setuptools` returns a list of entry points. +4. For each entry point, the `PluginManager` loads the corresponding module. +5. The `PluginManager` registers any hooks defined in the module. +6. The `PluginManager` signals to the `llm` core that plugin registration is complete. + +**Diving into the Code** + +Here's a look at the `PluginManager` initialization within `llm/plugins.py`: + +```python +import pluggy + +pm = pluggy.PluginManager("llm") +``` + +Explanation: + +* This code creates an instance of the `pluggy.PluginManager` class. `pluggy` is the core library for managing plugins. +* The `"llm"` argument is a name for the plugin manager. + +Here's how `llm` loads entry points: + +```python +pm.load_setuptools_entrypoints("llm") +``` + +Explanation: + +* This code tells the `PluginManager` to load all entry points with the group name `"llm"`. `setuptools` automatically registers packages with appropriate entry points. The `pm` instance loads them here. + +Here's how plugins can define hooks: + +```python +from llm import hookimpl + +@hookimpl +def register_models(register): + # Register your models here + pass +``` + +Explanation: + +* The `@hookimpl` decorator tells `llm` that this function should be called when `llm` is looking for implementations of the `register_models` hook. +* Plugins define their own implementations of these hooks, registering custom functionalities. + +**Conclusion** + +The `PluginManager` is a powerful tool for extending the functionality of `llm`. By using plugins, you can add new models, commands, and other features without modifying the core `llm` code. This makes `llm` incredibly flexible and adaptable to new technologies. + +In the next chapter, we'll explore [Collection](08_collection.md), which is a way to store and retrieve embeddings. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/08_collection.md b/docs/llm/08_collection.md new file mode 100644 index 0000000..971ac58 --- /dev/null +++ b/docs/llm/08_collection.md @@ -0,0 +1,192 @@ +# Chapter 8: Collection + +In the previous chapter, [PluginManager (pm)](07_pluginmanager__pm_.md), you learned how to extend `llm` using plugins. Now, let's explore how to store and retrieve embeddings using a `Collection`. + +Imagine you have a massive library of books. To find a book similar to one you're reading, you could manually compare it to every other book – a very slow process! A `Collection` in `llm` is like a smart librarian: it organizes "embeddings" (numerical representations of text or data) so you can quickly find similar items. + +**Why do we need a `Collection`?** + +Let's say you want to build a question-answering system. You have a database of documents, and you want to find the document most relevant to a user's question. You can embed the documents and the question, then use a `Collection` to quickly find the document with the most similar embedding to the question. Without a `Collection`, you'd have to compare the question embedding to *every* document embedding, which would be very slow. + +**Core Concepts: What is a `Collection`?** + +A `Collection` is essentially a digital filing cabinet for embeddings. Here are the key things to understand: + +1. **Embeddings:** These are numerical representations of text or other data, created by an [EmbeddingModel](07_pluginmanager__pm_.md) (a special type of [Model](03_model.md)). Think of them as coordinates that indicate how similar two pieces of data are. + +2. **Storing Embeddings:** The `Collection` stores these embeddings, along with the original text (or data) and any associated metadata. + +3. **Similarity Search:** The main purpose of a `Collection` is to efficiently find embeddings that are similar to a given query embedding. "Similar" means the embeddings are "close" to each other in the multi-dimensional space, which can be calculated using cosine similarity. + +4. **Metadata:** You can store extra information (like author, date, or category) along with each embedding. This metadata can be used to filter search results. + +**Solving the Use Case: Finding Similar Articles** + +Let's say you want to create a `Collection` of news articles and then find articles similar to a given article. Here's how you can do it: + +1. **Create a `Collection`:** First, you need to create a `Collection` object. This tells `llm` where to store the embeddings. + +2. **Embed Articles:** Use an [EmbeddingModel](07_pluginmanager__pm_.md) to embed each article. + +3. **Store Embeddings in the `Collection`:** Add each article's embedding, along with its text and metadata, to the `Collection`. + +4. **Find Similar Articles:** To find articles similar to a given article, embed that article and then use the `Collection` to find the most similar embeddings. + +Here's some simplified Python code (note that to run this code as-is, you'd need to make sure you have `sqlite-utils` installed, e.g. `pip install sqlite-utils`): + +```python +import llm +from sqlite_utils import Database + +# Create a database to store the collection +db = Database("articles.db") + +# Create a collection called "news_articles" +collection = llm.Collection(name="news_articles", db=db, model_id="clip") + +print(f"Collection '{collection.name}' created using model '{collection.model_id}'.") +``` + +Explanation: + +* First, we import the `llm` library and `Database` from `sqlite-utils`. +* We create an SQLite database called `"articles.db"` to store our `Collection`. +* We create a `Collection` object named `"news_articles"`, associating it with the database and the `"clip"` model (an embedding model that you'll need to have available in your `llm` setup). If the collection doesn't exist, it will be created. + +Now, let's add some articles to the `Collection`: + +```python +articles = [ + ("article1", "Climate change is a serious threat."), + ("article2", "The economy is growing rapidly."), + ("article3", "New technology is transforming society."), + ("article4", "Global warming is impacting weather patterns."), +] + +collection.embed_multi(articles) +print(f"Added {len(articles)} articles to the collection.") +``` + +Explanation: + +* We have a list of articles, where each article is a tuple containing an ID and the article text. +* We use the `embed_multi` method to embed all of the articles and store them in the `Collection`. + +Finally, let's find articles similar to a given article: + +```python +query_text = "What are the effects of climate change?" +similar_articles = collection.similar(query_text, number=2) + +print(f"Articles similar to '{query_text}':") +for article in similar_articles: + print(f"- {article.id}: {article.content} (score: {article.score})") +``` + +Explanation: + +* We define the text we want to find similar articles for. +* We use the `similar` method to find the 2 most similar articles in the `Collection`. +* We print the ID, text, and similarity score for each similar article. + +Example output (will vary slightly depending on the model and content): + +``` +Articles similar to 'What are the effects of climate change?': +- article4: Global warming is impacting weather patterns. (score: 0.85) +- article1: Climate change is a serious threat. (score: 0.80) +``` + +**Internal Implementation Walkthrough** + +Let's see what happens under the hood when you use a `Collection` to find similar items: + +```mermaid +sequenceDiagram + participant User + participant Collection + participant EmbeddingModel as EM + participant Database as DB + + User->>Collection: similar("query text", number=2) + Collection->>EM: embed("query text") + EM-->>Collection: embedding_vector + Collection->>Collection: similar_by_vector(embedding_vector, number=2) + Collection->>DB: Query embeddings table for similar vectors + DB-->>Collection: List of similar embeddings + Collection->>User: List of Entry objects +``` + +This diagram shows: + +1. The user calls the `similar` method on a `Collection` object with the text to search for and the desired number of results. +2. The `Collection` uses the [EmbeddingModel](07_pluginmanager__pm_.md) associated with the `Collection` to embed the query text. +3. The `Collection` calls the `similar_by_vector` function using the generated vector. +4. The `Collection` queries the database for embeddings similar to the query embedding. +5. The database returns a list of similar embeddings. +6. The `Collection` returns a list of `Entry` objects, each containing the ID, text, and similarity score for a similar item. + +**Diving into the Code** + +Let's look at some key parts of the `Collection` class definition in `llm/embeddings.py`: + +```python +from dataclasses import dataclass +from typing import Optional, Dict, Any, Union +import time +import hashlib + +@dataclass +class Entry: # Represents a single entry in the collection + id: str + score: Optional[float] + content: Optional[str] = None + metadata: Optional[Dict[str, Any]] = None + +class Collection: + def __init__(self, name: str, db, model_id: Optional[str] = None) -> None: + self.name = name + self.db = db + self.model_id = model_id # ID of the embedding model to use + + def embed(self, id: str, value: Union[str, bytes], metadata: Optional[Dict[str, Any]] = None): + # Embed the value and store in the database + embedding = self.model().embed(value) + self.db["embeddings"].insert({ # Insert or replace + "collection_id": self.id, + "id": id, + "embedding": self.encode(embedding), + "content": value if isinstance(value, str) else None, # only if store==True + "metadata": self.encode_metadata(metadata), + "updated": int(time.time()), + }, replace=True) # Deduplicate based on content hash + + def similar(self, value: Union[str, bytes], number: int = 10) -> list[Entry]: + # Find similar entries + comparison_vector = self.model().embed(value) + return self.similar_by_vector(comparison_vector, number) + @staticmethod + def content_hash(input: Union[str, bytes]) -> bytes: + "Hash content for deduplication. Override to change hashing behavior." + if isinstance(input, str): + input = input.encode("utf8") + return hashlib.md5(input).digest() +``` + +Explanation: + +* The `Entry` dataclass defines the structure for storing information about each item in the `Collection`. +* The `Collection` class stores items along with the metadata and can find the similar items based on the embedding vectors. +* The `embed` method embeds the given text, calculates hash and stores it in the "embeddings" table (replace if existing). +* The `similar` method finds the similar items using the embeddings. + +**Conclusion** + +The `Collection` provides a way to efficiently store and retrieve embeddings, allowing you to quickly find similar items in a large dataset. This is useful for building question-answering systems, recommendation engines, and other applications that rely on semantic similarity. + +Congratulations! You have completed the core tutorial for `llm`. You can now use `llm` to interact with Large Language Models, customize their behavior, and extend their functionality. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/llm/index.md b/docs/llm/index.md new file mode 100644 index 0000000..0dc075e --- /dev/null +++ b/docs/llm/index.md @@ -0,0 +1,44 @@ +# Tutorial: LLM + +`LLM` is a command-line tool that lets you interact with **Large Language Models (LLMs)**, like having a conversation with a *smart AI*. You provide a `Prompt`, the `LLM` sends it to a `Model`, and the `Model` returns a `Response`. It can be extended via `PluginManager` to support various models and functionalities. + + +**Source Repository:** [None](None) + +```mermaid +flowchart TD + A0["Model"] + A1["Prompt"] + A2["Response"] + A3["PluginManager (pm)"] + A4["Options"] + A5["Collection"] + A6["Template"] + A7["cli (Click CLI group)"] + A1 -- "Sent to" --> A0 + A0 -- "Generates" --> A2 + A0 -- "Uses" --> A4 + A3 -- "Registers" --> A0 + A3 -- "Registers commands" --> A7 + A7 -- "Accepts input" --> A1 + A7 -- "Loads" --> A6 + A5 -- "Uses for embeddings" --> A0 + A6 -- "Generates" --> A1 + A2 -- "Stores in" --> A5 +``` + +## Chapters + +1. [cli (Click CLI group)](01_cli__click_cli_group_.md) +2. [Prompt](02_prompt.md) +3. [Model](03_model.md) +4. [Response](04_response.md) +5. [Options](05_options.md) +6. [Template](06_template.md) +7. [PluginManager (pm)](07_pluginmanager__pm_.md) +8. [Collection](08_collection.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/01_promptsession.md b/docs/pgcli/01_promptsession.md new file mode 100644 index 0000000..ef777c6 --- /dev/null +++ b/docs/pgcli/01_promptsession.md @@ -0,0 +1,157 @@ +# Chapter 1: PromptSession + +Imagine you're using a tool like `psql` to talk to your PostgreSQL database. You type in commands, see the results, and repeat. The `pgcli` tool aims to give you a much better experience while doing that. One of the core parts of making that happen is the `PromptSession`. + +Think of `PromptSession` as the intelligent "command line" inside `pgcli`. It's what makes typing commands interactive and enjoyable. It handles displaying the prompt (like `user@host:database>`), understanding what you type, giving you suggestions, and remembering your past commands. + +In this chapter, we'll explore what `PromptSession` is and how it powers the interactive experience of `pgcli`. We will use a simple example: type a SQL query, then execute it. + +## What Problem Does PromptSession Solve? + +Without `PromptSession`, you'd have a very basic command line. No fancy features! Here's what you'd miss: + +* **Syntax Highlighting:** SQL keywords would all be the same color, making it harder to spot errors. +* **Auto-Suggestions:** No help guessing table or column names as you type. +* **Key Bindings:** No convenient shortcuts for moving around the command line or editing your query. +* **Command History:** No easy way to re-run a previous command. + +`PromptSession` takes care of all this, making your interaction with `pgcli` much more efficient and pleasant. + +## Key Concepts of PromptSession + +`PromptSession` relies on a few important ideas: + +1. **The Prompt:** That little bit of text that tells you `pgcli` is ready for your input (e.g., `user@host:database>`). It's more customizable than you might think! +2. **Input Buffer:** The area where you actually type your SQL commands. +3. **Key Bindings:** Shortcuts that let you do things quickly with your keyboard (like pressing `Ctrl+Space` for autocompletion). +4. **Lexer:** This analyzes what you're typing (e.g., identifying keywords, table names, etc.) so it can apply syntax highlighting. +5. **Completer:** The engine that suggests possible completions for what you're typing. +6. **History:** The record of commands you've entered in the past. + +## Using PromptSession: A Simple Example + +Let's see `PromptSession` in action. + +**Step 1: Start `pgcli`** + +When you run `pgcli`, the `PromptSession` is initialized. The first thing you see is the prompt. It presents the prompt and waits for input. + +**Step 2: Type a SQL query** + +Let's type a simple query: + +```sql +SELECT * FROM users; +``` + +As you type, `PromptSession` provides syntax highlighting, making keywords like `SELECT` and `FROM` stand out. It might also suggest table names as you type `users` if you have a `users` table in your database. + +**Step 3: Execute the query** + +Press `Enter` to execute the query. `PromptSession` sends the query to the database, and `pgcli` displays the results. + +**Step 4: Access Command History** + +Press the up arrow key. `PromptSession` recalls the `SELECT * FROM users;` query from your history, allowing you to easily run it again or edit it. + +## Diving Deeper: How PromptSession Works Internally + +Let's break down what happens behind the scenes when you use `PromptSession`. + +```mermaid +sequenceDiagram + participant User + participant PG as PGCli + participant PS as PromptSession + participant PT as prompt_toolkit + participant DB as PostgreSQL + + User->>PG: Starts pgcli + PG->>PS: Initializes PromptSession + PS->>PT: Uses prompt_toolkit library + User->>PS: Types SQL query + PS->>PT: Displays prompt, highlights syntax, suggests completions + User->>PS: Press Enter + PS->>PG: Sends SQL query to PGCli + PG->>DB: Executes SQL query + DB->>PG: Returns results + PG->>PS: Receives and formats results + PS->>PT: Displays formatted results + PT->>User: Shows results in terminal +``` + +Here's a simplified step-by-step explanation of what happens: + +1. **Initialization:** When `pgcli` starts, it creates a `PromptSession` object. This object sets up all the interactive features, like syntax highlighting and autocompletion, by leveraging the `prompt_toolkit` library. +2. **User Input:** You type a SQL query in the input buffer of the `PromptSession`. +3. **Real-time Interaction:** As you type, `PromptSession` uses a lexer (specifically, `PygmentsLexer` with `PostgresLexer`) to understand the SQL syntax and apply highlighting. It also uses a completer (like the [PGCompleter](04_pgcompleter.md) discussed in a later chapter) to suggest completions. +4. **Execution:** When you press `Enter`, `PromptSession` captures the complete SQL query. The query will be sent to [PGExecute](02_pgexecute.md) for execution. +5. **Displaying Results:** After the query is executed by [PGExecute](02_pgexecute.md), `PromptSession` receives and formats the result and displays the result. + +## Code Snippets and Explanation + +Here's a look at some of the code that creates and configures the `PromptSession` (from `pgcli/main.py`): + +```python +from prompt_toolkit.shortcuts import PromptSession +from prompt_toolkit.lexers import PygmentsLexer +from pygments.lexers.sql import PostgresLexer +from prompt_toolkit.history import FileHistory +from prompt_toolkit.auto_suggest import AutoSuggestFromHistory +from .key_bindings import pgcli_bindings + +#... skipping some lines + prompt_app = PromptSession( + lexer=PygmentsLexer(PostgresLexer), + message=get_message, + history=history, + auto_suggest=AutoSuggestFromHistory(), + key_bindings=key_bindings, + ) + + return prompt_app +``` + +Let's break this down: + +* `PromptSession(...)`: Creates a new `PromptSession` object. This is where we configure all the interactive features. +* `lexer=PygmentsLexer(PostgresLexer)`: Sets the lexer to `PygmentsLexer` with `PostgresLexer`. This enables syntax highlighting for SQL. +* `message=get_message`: Sets the function that generates the prompt message. This is how you can customize the prompt. +* `history=history`: Sets the history object to the file history. This enables the command history feature. +* `auto_suggest=AutoSuggestFromHistory()`: Enables auto-suggestions from the command history. +* `key_bindings=key_bindings`: Sets the custom key bindings for `pgcli`. + +Here's a quick glimpse at how key bindings are defined (from `pgcli/key_bindings.py`): + +```python +from prompt_toolkit.key_binding import KeyBindings + +def pgcli_bindings(pgcli): + kb = KeyBindings() + + @kb.add("c-space") + def _(event): + """ + Initialize autocompletion at cursor. + """ + b = event.app.current_buffer + if b.complete_state: + b.complete_next() + else: + b.start_completion(select_first=False) + + return kb +``` + +This code defines the `Ctrl+Space` key binding for autocompletion. When you press `Ctrl+Space`, the `start_completion` method of the current buffer is called. The completer (again, often the [PGCompleter](04_pgcompleter.md)) then kicks in to suggest possible completions. + +## Conclusion + +The `PromptSession` is the heart of `pgcli`'s interactive command line. It uses syntax highlighting, autocompletion, key bindings, and command history to provide a user-friendly experience. The `prompt_toolkit` library provides powerful tools for creating interactive command-line applications, and `PromptSession` puts those tools to work for you. + +In the next chapter, we'll look at how `pgcli` actually executes your SQL queries using the [PGExecute](02_pgexecute.md) abstraction. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/02_pgexecute.md b/docs/pgcli/02_pgexecute.md new file mode 100644 index 0000000..9a5d6db --- /dev/null +++ b/docs/pgcli/02_pgexecute.md @@ -0,0 +1,160 @@ +# Chapter 2: PGExecute + +In the [PromptSession](01_promptsession.md) chapter, we learned how `pgcli` takes your commands. But how does `pgcli` actually *run* those commands and get the results back from your PostgreSQL database? That's where `PGExecute` comes in! + +Think of `PGExecute` as the "database interaction manager" for `pgcli`. It's the piece of code responsible for connecting to your database, sending your SQL queries, and bringing back the results to display on your screen. Without `PGExecute`, `pgcli` would just be a fancy text editor! + +## What Problem Does PGExecute Solve? + +Imagine you want to get a list of all the users in your database. You type the query `SELECT * FROM users;` into `pgcli`. Here's what `PGExecute` does behind the scenes: + +1. **Connects to the Database:** Establishes a connection to your PostgreSQL database using the credentials you provided (username, password, host, database name, etc.). +2. **Sends the Query:** Transmits the `SELECT * FROM users;` query to the database server. +3. **Receives the Results:** Gets the results back from the database (the rows from the `users` table). +4. **Manages Transactions:** Handles database transactions, if your query involves them (like `BEGIN`, `COMMIT`, `ROLLBACK`). +5. **Returns the Data:** Provides the results back to `pgcli` so it can display them to you in a readable format. + +Without `PGExecute`, you'd have to manually handle all of these steps yourself using a low-level database library! `PGExecute` simplifies this process by abstracting away the complexities of database interaction. + +## Key Concepts of PGExecute + +Let's break down the key concepts behind `PGExecute`: + +1. **Connection Management:** `PGExecute` handles the connection to your PostgreSQL database. It stores the connection details (host, port, user, password, database name) and uses them to establish and maintain a connection. It uses the `psycopg` library under the hood. + +2. **Query Execution:** `PGExecute` provides methods for executing SQL queries against the database. It takes a SQL query as input and sends it to the database for execution. + +3. **Result Handling:** `PGExecute` receives the results from the database and makes them available to `pgcli`. It also handles errors that may occur during query execution. + +4. **Transactions:** `PGExecute` supports database transactions. Transactions allow you to group multiple SQL statements into a single unit of work. If any statement in the transaction fails, the entire transaction is rolled back, ensuring data consistency. + +## Using PGExecute: A Simple Example + +Let's see how `PGExecute` is used in `pgcli`. + +**Step 1: Type a SQL query into `pgcli`** + +Let's type our example query again: + +```sql +SELECT * FROM users; +``` + +**Step 2: `PromptSession` hands off the query** + +As we learned in [PromptSession](01_promptsession.md), the `PromptSession` captures the query. When you press `Enter`, the `PromptSession` takes the query and passes it on to `PGExecute`. + +**Step 3: `PGExecute` does its magic** + +`PGExecute` receives the query, connects to the database, executes the query, retrieves the results, and passes the results back to `PromptSession`. + +**Step 4: `PromptSession` displays the results** + +The `PromptSession` then formats the results and displays them in your terminal. You see the data from the `users` table! + +**Behind the Scenes:** + +```python +# This is a simplified illustration. The real code is more complex. + +# Get the query from PromptSession +query = "SELECT * FROM users;" + +# Create a PGExecute object (usually done once when pgcli starts) +pg_execute = PGExecute(database="your_db", user="your_user", password="your_password", host="localhost") + +# Execute the query +title, cur, headers, status = pg_execute.execute_normal_sql(query) + +# 'cur' now contains the results (if any). 'headers' are the column names. +# This result is then passed to the PromptSession. +``` + +This is a simplified example, but it demonstrates the basic flow of how `PGExecute` is used to execute SQL queries in `pgcli`. + +## Diving Deeper: How PGExecute Works Internally + +Let's look at what happens inside `PGExecute` when you run a query. + +```mermaid +sequenceDiagram + participant User + participant PG as PGCli + participant PS as PromptSession + participant PGEx as PGExecute + participant DB as PostgreSQL + + User->>PS: Types SQL query and presses Enter + PS->>PG: Sends SQL query to PGCli + PG->>PGEx: Executes SQL query using PGExecute + PGEx->>DB: Connects to PostgreSQL and sends the query + DB->>PGEx: Returns results + PGEx->>PG: Returns results to PGCli + PG->>PS: Sends results for display + PS->>User: Displays results in terminal +``` + +Here's a simplified step-by-step explanation: + +1. **User Input:** You type a SQL query in `pgcli` and press `Enter`. +2. **PromptSession:** The [PromptSession](01_promptsession.md) captures the SQL query. +3. **PGExecute:** The `PromptSession` passes the captured SQL query to `PGExecute`. +4. **Database Interaction:** `PGExecute` uses the `psycopg` library to connect to the PostgreSQL database, sends the SQL query, and retrieves the results. +5. **Results Returned:** The results are sent back to `PGExecute`, then passed back through `pgcli` and `PromptSession` and displayed in your terminal. + +Now, let's look at some of the code inside `pgcli/pgexecute.py`: + +```python +import psycopg + +class PGExecute: + def __init__( + self, + database=None, + user=None, + password=None, + host=None, + port=None, + dsn=None, + notify_callback=None, + **kwargs, + ): + self.conn = None + self.connect(database, user, password, host, port, dsn, **kwargs) + + def connect(self, database=None, user=None, password=None, host=None, port=None, dsn=None, **kwargs): + # This method establishes the connection to the database. + conn_info = psycopg.conninfo.make_conninfo(dbname=database, user=user, password=password, host=host, port=port) + self.conn = psycopg.connect(conn_info) # Connects to DB + self.conn.autocommit = True # important! +``` + +This code shows how `PGExecute` initializes a connection to the database using the `psycopg` library. The `connect()` method takes the database connection parameters and uses them to establish a connection. Note the `conn.autocommit = True`, which makes each query run in its own transaction, unless you explicitly start a transaction with `BEGIN`. + +```python + def execute_normal_sql(self, split_sql): + """Returns tuple (title, rows, headers, status)""" + cur = self.conn.cursor() + cur.execute(split_sql) + + # cur.description will be None for operations that do not return + # rows. + if cur.description: + headers = [x[0] for x in cur.description] + return "", cur, headers, cur.statusmessage + else: + return "", None, None, cur.statusmessage +``` + +This code shows how `PGExecute` executes a normal SQL query. It creates a cursor object, executes the SQL query using the cursor, and returns the results. The `cur.description` attribute contains information about the columns in the result set (if any). The `cur.statusmessage` contains the status message returned by the database (e.g., "SELECT 1", "INSERT 0 1"). + +## Conclusion + +The `PGExecute` class is the workhorse of `pgcli`, handling the database connection, query execution, and result retrieval. It uses the `psycopg` library to interact with the PostgreSQL database and provides methods for executing SQL queries and managing transactions. By abstracting away the complexities of database interaction, `PGExecute` makes it easier to build a powerful and user-friendly command-line interface for PostgreSQL. + +In the next chapter, we'll explore [PGSpecial](03_pgspecial.md), which handles special commands in `pgcli` that are not standard SQL queries (like `\d` to describe a table). + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/03_pgspecial.md b/docs/pgcli/03_pgspecial.md new file mode 100644 index 0000000..28a9c25 --- /dev/null +++ b/docs/pgcli/03_pgspecial.md @@ -0,0 +1,164 @@ +# Chapter 3: PGSpecial + +In the previous chapter, [PGExecute](02_pgexecute.md), we learned how `pgcli` executes SQL queries. But what about those special commands that start with a backslash, like `\d` to describe a table? That's where PGSpecial comes in! + +PGSpecial provides special commands, similar to those in `psql`. Think of them as shortcuts that allow you to do more than just regular SQL. They are distinguished from regular SQL by the leading backslash. Examples include listing databases, connecting to a new database, or executing a query from a file. PGSpecial parses these commands and performs the appropriate actions. + +## What Problem Does PGSpecial Solve? + +Imagine you're using `pgcli` and want to quickly see a list of all the tables in your current database. Instead of writing a complex SQL query, you just type `\dt` and press Enter. PGSpecial handles this for you! + +Here's what PGSpecial does behind the scenes: + +1. **Recognizes the Special Command:** It sees that your input starts with a backslash (`\`), so it knows it's a special command, not regular SQL. +2. **Parses the Command:** It figures out which special command you typed (in this case, `\dt` for "list tables"). +3. **Executes the Command:** It runs the appropriate internal logic to fetch the table names from the database. +4. **Formats the Output:** It presents the list of tables in a readable way on your screen. + +Without PGSpecial, you'd have to remember and type out complicated SQL queries for these common tasks. It simplifies your workflow and makes `pgcli` more user-friendly. + +## Key Concepts of PGSpecial + +Let's break down the key concepts behind PGSpecial: + +1. **Special Commands:** These are the backslash commands themselves, like `\dt`, `\c`, `\i`, etc. Each command has a specific function. +2. **Parsing:** PGSpecial has a parser that understands the special commands and any arguments they might take (e.g., the filename after `\i`). +3. **Execution Logic:** For each special command, there's code that knows how to perform the requested action. This might involve querying the database, changing the connection settings, or reading from a file. + +## Using PGSpecial: A Simple Example + +Let's see how PGSpecial is used in `pgcli`. + +**Step 1: Type a Special Command into `pgcli`** + +Let's type our example command: + +``` +\dt +``` + +**Step 2: `PromptSession` hands off the command** + +As we learned in [PromptSession](01_promptsession.md), the `PromptSession` captures the command. When you press `Enter`, the `PromptSession` takes the command and passes it on to [PGExecute](02_pgexecute.md). + +**Step 3: PGSpecial intercepts the command** + +[PGExecute](02_pgexecute.md) checks if the command is a special command, if so, then it is passed on to PGSpecial. + +**Step 4: PGSpecial does its magic** + +PGSpecial receives the command, parses it, executes the internal logic to get the list of tables, and passes the results back to [PGExecute](02_pgexecute.md). + +**Step 5: `PromptSession` displays the results** + +The `PromptSession` then formats the results and displays them in your terminal. You see a list of tables in your database! + +**Behind the Scenes:** + +```python +# This is a simplified illustration. The real code is more complex. + +# Get the command from PromptSession +command = "\dt" + +# Create a PGSpecial object (usually done once when pgcli starts) +pg_special = PGSpecial() + +# Execute the special command +results = pg_special.execute(None, command) # None is passed as cursor here + +# 'results' now contains the formatted table list. +# This result is then passed to the PromptSession for display. +``` + +This simplified example shows how PGSpecial intercepts and handles special commands in `pgcli`. + +## Diving Deeper: How PGSpecial Works Internally + +Let's peek at what happens inside PGSpecial when you run a special command. + +```mermaid +sequenceDiagram + participant User + participant PG as PGCli + participant PS as PromptSession + participant PGEx as PGExecute + participant PGS as PGSpecial + + User->>PS: Types \dt and presses Enter + PS->>PG: Sends command to PGCli + PG->>PGEx: Checks command type using PGExecute + PGEx->>PGS: Executes special command using PGSpecial + PGS->>PGEx: Returns results to PGExecute + PGEx->>PG: Returns results to PGCli + PG->>PS: Sends results for display + PS->>User: Displays list of tables in terminal +``` + +Here's a simplified step-by-step explanation: + +1. **User Input:** You type a special command (e.g., `\dt`) in `pgcli` and press `Enter`. +2. **PromptSession:** The [PromptSession](01_promptsession.md) captures the command. +3. **PGExecute:** The [PromptSession](01_promptsession.md) passes the command to [PGExecute](02_pgexecute.md), which determines that it's a special command. +4. **PGSpecial:** [PGExecute](02_pgexecute.md) passes the command to PGSpecial. +5. **Command Execution:** PGSpecial finds the corresponding function for `\dt` and executes it. This involves connecting to the database (via a cursor object from [PGExecute](02_pgexecute.md)), running a query to fetch table names, and formatting the output. +6. **Results Returned:** The formatted table list is returned and eventually displayed in your terminal. + +Now, let's look at some of the relevant code inside `pgcli/main.py`: + +```python +from pgspecial.main import PGSpecial +# ... skipping some lines + +class PGCli: + def __init__(self, ...): + # ... skipping some lines + self.pgspecial = PGSpecial() #Creates a PGSpecial object + # ... skipping some lines + + def execute_command(self, text, handle_closed_connection=True): + # ... skipping some lines + try: + if pgspecial: + # First try to run each query as special + _logger.debug("Trying a pgspecial command. sql: %r", sql) + try: + cur = self.conn.cursor() + except psycopg.InterfaceError: + # edge case when connection is already closed, but we + # don't need cursor for special_cmd.arg_type == NO_QUERY. + # See https://github.com/dbcli/pgcli/issues/1014. + cur = None + try: + response = pgspecial.execute(cur, sql) #Calls the execute method on PGSpecial + # ... skipping some lines +``` + +This code shows how `pgcli` creates a `PGSpecial` object and uses its `execute` method to handle special commands. The code checks first if `pgspecial` exists, then tries to execute the command. + +Here's the code for registering the special command within the `PGCli` class: + +```python + def register_special_commands(self): + self.pgspecial.register( + self.change_db, + "\\c", + "\\c[onnect] database_name", + "Change to a new database.", + aliases=("use", "\\connect", "USE"), + ) +# ... skipping some lines +``` + +This `register` method allows us to register Python functions to special commands. Here we register `self.change_db` to special command `\c`. If the user types `\c `, then `self.change_db` will be called. + +## Conclusion + +PGSpecial is a valuable component of `pgcli`, providing a convenient way to execute common database management tasks using backslash commands. By parsing these commands and performing the appropriate actions, PGSpecial enhances the user experience and simplifies interaction with PostgreSQL. + +In the next chapter, we'll explore [PGCompleter](04_pgcompleter.md), which provides intelligent autocompletion for SQL queries and special commands in `pgcli`. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/04_pgcompleter.md b/docs/pgcli/04_pgcompleter.md new file mode 100644 index 0000000..f23b99f --- /dev/null +++ b/docs/pgcli/04_pgcompleter.md @@ -0,0 +1,150 @@ +# Chapter 4: PGCompleter + +In the previous chapter, [PGSpecial](03_pgspecial.md), we explored how `pgcli` handles special backslash commands. Now, let's dive into something that makes typing those commands (and SQL queries in general!) much easier: `PGCompleter`. + +Imagine you're typing a SQL query like `SELECT * FROM cust`. Wouldn't it be nice if `pgcli` could automatically suggest `customers` as a possible table name after you type `cust`? That's exactly what `PGCompleter` does! + +`PGCompleter` is like an intelligent auto-complete feature specifically for SQL. It suggests possible completions for what you're typing, including: + +* Table names +* Column names +* SQL keywords (like `SELECT`, `FROM`, `WHERE`) +* Function names +* Database names +* Schema names +* And more! + +This saves you time, reduces typos, and helps you explore your database schema. + +## What Problem Does PGCompleter Solve? + +Without `PGCompleter`, you'd have to remember the exact names of all your tables, columns, and functions. This can be a real pain, especially in large databases! `PGCompleter` solves this by: + +1. **Reducing Typing:** It suggests completions as you type, so you don't have to type out the full name of everything. +2. **Preventing Errors:** By suggesting valid options, it helps you avoid typos and syntax errors. +3. **Discovering Your Schema:** It shows you available tables, columns, and functions, which can help you explore your database. + +## Key Concepts of PGCompleter + +`PGCompleter` relies on a few key ideas: + +1. **Metadata:** `PGCompleter` needs information about your database to provide suggestions. This includes table names, column names, data types, and function signatures. It gets this information by querying the database when `pgcli` starts up (and periodically refreshes it). +2. **Context Awareness:** `PGCompleter` analyzes what you've already typed to understand the context of your query. For example, if you type `SELECT * FROM`, it knows that you're likely going to type a table name next. +3. **Prioritization:** Not all suggestions are created equal. `PGCompleter` tries to prioritize the most relevant suggestions based on your query history and the structure of your database. +4. **Completion Types:** `PGCompleter` recognizes different types of things you might be trying to complete (tables, columns, keywords, etc.) and uses different strategies for each. + +## Using PGCompleter: A Simple Example + +Let's see `PGCompleter` in action. + +**Step 1: Start `pgcli` and connect to your database.** + +**Step 2: Type `SELECT * FROM cust` and press `Tab`.** + +As you type, `PGCompleter` looks at your database metadata and sees if there are any tables that start with `cust`. If you have a table named `customers`, it will suggest that as a completion. + +**Step 3: Press `Tab` repeatedly to cycle through the suggestions (if there are multiple).** + +**Step 4: Press `Enter` to accept the suggestion.** + +`PGCompleter` automatically completes the table name to `customers`, and you can continue typing your query. + +**Behind the Scenes:** + +```python +# This is a simplified illustration. The real code is more complex. + +# The user types "SELECT * FROM cust" + +# PGCompleter analyzes the input and determines that the user is likely trying to complete a table name after FROM. + +# PGCompleter queries the database metadata to find all table names. + +# PGCompleter finds "customers" as a table name that starts with "cust". + +# PGCompleter suggests "customers" as a completion. + +# The user presses Tab and PGCompleter completes the table name. +``` + +## Diving Deeper: How PGCompleter Works Internally + +Let's take a look at the steps that happen inside `PGCompleter` to make autocompletion happen. + +```mermaid +sequenceDiagram + participant User + participant PS as PromptSession + participant PGC as PGCompleter + participant DB as PostgreSQL + + User->>PS: Types "SELECT * FROM cust" + Tab + PS->>PGC: Requests completions from PGCompleter + PGC->>PGC: Analyzes the SQL context to identify the completion type (Table) + PGC->>DB: Queries the database metadata for available table names + DB->>PGC: Returns list of table names + PGC->>PGC: Filters the table names and prioritizes suggestions + PGC->>PS: Returns completion suggestions ("customers") + PS->>User: Displays "customers" as completion suggestion +``` + +Here's a breakdown of what happens: + +1. **User Input:** You type part of a SQL query and press `Tab`. +2. **PromptSession:** The [PromptSession](01_promptsession.md) detects the `Tab` keypress and asks the `PGCompleter` for possible completions. +3. **Analysis:** `PGCompleter` analyzes the SQL code you've typed to figure out what kind of thing you're trying to complete (e.g., a table name, a column name, a keyword). The `suggest_type` function is crucial here! +4. **Metadata Retrieval:** `PGCompleter` uses the `PGExecute` (as seen in [PGExecute](02_pgexecute.md)) to query the PostgreSQL database for relevant metadata. For instance, if you're completing a table name, it fetches a list of all table names from the database. +5. **Filtering and Prioritization:** `PGCompleter` filters the results based on what you've already typed (e.g., if you typed `cust`, it only shows tables that start with `cust`). It also prioritizes the suggestions using the `PrevalenceCounter` class, giving frequently used names higher priority. +6. **Suggestion Display:** `PGCompleter` returns a list of `Completion` objects to the [PromptSession](01_promptsession.md), which then displays them as suggestions in the `pgcli` interface. + +Now, let's look at some simplified code snippets from `pgcli/pgcompleter.py`: + +```python +from prompt_toolkit.completion import Completion + +class PGCompleter: + def get_completions(self, document, complete_event): + word_before_cursor = document.get_word_before_cursor(WORD=True) + suggestions = suggest_type(document.text, document.text_before_cursor) # Determine the completion type + # ... skipping some lines + for suggestion in suggestions: + suggestion_type = type(suggestion) + if suggestion_type == Table: + #Logic to fetch and return table names from DB + # ... skipping some lines + yield Completion(table_name, -len(word_before_cursor)) #Create Completion Object and return + +``` + +This code shows the main `get_completions` function that is called when you press `Tab`. +* First, it determines the context using `suggest_type`. +* Based on the context, it fetches the relevant objects (table names here). +* Lastly, creates a `Completion` object to return. + +Here's another snippet showing how table names are fetched from the database: + +```python + def extend_relations(self, data, kind): + """extend metadata for tables or views. + """ + metadata = self.dbmetadata[kind] + for schema, relname in data: + try: + metadata[schema][relname] = OrderedDict() + except KeyError: + pass + self.all_completions.add(relname) +``` + +The `extend_relations` function is used to store table names into an internal metadata variable inside `PGCompleter`. This metadata is pre-fetched from the database to be used for suggestions. + +## Conclusion + +`PGCompleter` is a powerful tool that makes typing SQL queries in `pgcli` faster, easier, and more accurate. By analyzing your code, retrieving metadata from the database, and prioritizing suggestions, it provides intelligent autocompletion that significantly improves your workflow. + +In the next chapter, we'll look at [Config Management](05_config_management.md), which handles the configuration settings of `pgcli`. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/05_config_management.md b/docs/pgcli/05_config_management.md new file mode 100644 index 0000000..6791bfd --- /dev/null +++ b/docs/pgcli/05_config_management.md @@ -0,0 +1,168 @@ +# Chapter 5: Config Management + +In the previous chapter, [PGCompleter](04_pgcompleter.md), we saw how `pgcli` provides smart autocompletion suggestions. Now, let's talk about how `pgcli` remembers *your* preferences. Think of it like setting up the options in your favorite game – how do you tell `pgcli` to use your favorite colors, keybindings, or connection settings every time you start it? That's where Config Management comes in! + +Config Management is all about handling `pgcli`'s configuration settings. It's like a settings panel for the application. It determines things like the style, keybindings, connection options, etc. It reads these settings from a configuration file and allows you to customize `pgcli`'s behavior to fit your needs. This ensures that each user can tailor the application to their own preferences and environment. + +## What Problem Does Config Management Solve? + +Imagine you want `pgcli` to always use a specific color scheme (like "solarized") and a particular table format (like "psql"). Without Config Management, you'd have to manually set these options every time you launch `pgcli`. That would get annoying fast! + +Config Management solves this by: + +1. **Persisting Settings:** It saves your preferences in a configuration file so they're automatically applied each time you run `pgcli`. +2. **Customization:** It lets you tweak almost every aspect of `pgcli`, from the way it looks to how it behaves. +3. **Centralized Control:** All your settings are stored in one place, making it easy to manage and update them. + +## Key Concepts of Config Management + +Here's what you need to know about how `pgcli` manages its configuration: + +1. **Configuration File:** `pgcli` reads its settings from a file named `config`. This file is usually located in a directory like `~/.config/pgcli/` (on Linux/macOS) or in your user profile's `AppData` folder (on Windows). You can find the exact location by running `pgcli --pgclirc` in your terminal. +2. **Config Sections:** The `config` file is organized into sections, like `[main]`, `[colors]`, and `[data_formats]`. Each section contains related settings. +3. **Settings (Key-Value Pairs):** Within each section, settings are defined as key-value pairs. For example, `table_format = psql` sets the default table output format to "psql". + +## Using Config Management: A Simple Example + +Let's say you want to change `pgcli`'s color scheme to "solarized". Here's how you'd do it: + +**Step 1: Find the Configuration File** + +Open your terminal and type: + +```bash +pgcli --pgclirc +``` + +This will print the location of your `pgcli` config file. For example, it might say: + +``` +/home/your_user/.config/pgcli/config +``` + +**Step 2: Edit the Configuration File** + +Open the `config` file in your favorite text editor (like VS Code, Notepad++, or even `nano` in the terminal). + +**Step 3: Modify the `syntax_style` Setting** + +Find the `[main]` section in the `config` file. Look for a line that says `syntax_style = ...`. Change it to: + +``` +syntax_style = solarized +``` + +**Step 4: Save the Configuration File** + +Save the changes you made to the `config` file. + +**Step 5: Restart `pgcli`** + +Close `pgcli` and start it again. Now, `pgcli` should use the "solarized" color scheme! + +**Behind the Scenes:** + +When `pgcli` starts, it reads the `config` file, finds the `syntax_style` setting, and uses that value to set the color scheme. All this is done automatically, so you don't have to worry about setting it manually each time. + +## Diving Deeper: How Config Management Works Internally + +Let's take a look at how `pgcli` handles configuration behind the scenes. + +```mermaid +sequenceDiagram + participant User + participant PGC as PGCli + participant CFG as Config + participant FS as File System + + User->>PGC: Starts pgcli + PGC->>CFG: Loads configuration file + CFG->>FS: Reads configuration file from disk + FS->>CFG: Returns configuration data + CFG->>PGC: Provides configuration settings + PGC->>PGC: Applies configuration settings (style, keybindings, etc.) + PGC->>User: Presents pgcli with customized settings +``` + +Here's the step-by-step process: + +1. **`pgcli` Startup:** When you start `pgcli`, the `PGCli` class is initialized (as seen in [main.py](02_pgexecute.md)). +2. **Loading the Configuration:** The `PGCli` class uses the `get_config()` function (defined in `pgcli/config.py`) to load the configuration settings from the `config` file. +3. **Reading the File:** The `get_config()` function reads the contents of the `config` file using the `ConfigObj` library, which is designed for parsing configuration files. +4. **Applying the Settings:** The `PGCli` class then applies these settings to configure various aspects of `pgcli`, such as the color scheme, keybindings, table output format, and more. + +Now, let's look at the relevant code snippets from `pgcli/config.py`: + +```python +from configobj import ConfigObj +import os +from os.path import expanduser + +def config_location(): + if "XDG_CONFIG_HOME" in os.environ: + return "%s/pgcli/" % expanduser(os.environ["XDG_CONFIG_HOME"]) + else: + return expanduser("~/.config/pgcli/") + +def get_config_filename(pgclirc_file=None): + return pgclirc_file or "%sconfig" % config_location() + +def load_config(usr_cfg, def_cfg=None): + if def_cfg: + cfg = ConfigObj() + cfg.merge(ConfigObj(def_cfg, interpolation=False)) + cfg.merge(ConfigObj(expanduser(usr_cfg), interpolation=False, encoding="utf-8")) + else: + cfg = ConfigObj(expanduser(usr_cfg), interpolation=False, encoding="utf-8") + cfg.filename = expanduser(usr_cfg) + return cfg + +def get_config(pgclirc_file=None): + from pgcli import __file__ as package_root + + package_root = os.path.dirname(package_root) + + pgclirc_file = get_config_filename(pgclirc_file) + + default_config = os.path.join(package_root, "pgclirc") + write_default_config(default_config, pgclirc_file) + + return load_config(pgclirc_file, default_config) +``` + +Explanation: + +* `config_location()`: This function determines the default location of the `pgcli` configuration directory based on the operating system and environment variables. +* `get_config_filename()`: Returns the file path of the config file. +* `load_config()`: Loads the settings from a config file using the `ConfigObj` library. It will also merge the default config values into the user config. + +Here's how the settings are applied inside `pgcli/main.py`: + +```python +from .config import get_config +from .pgstyle import style_factory, style_factory_output +class PGCli: + def __init__(self, ...): + # Load config. + c = self.config = get_config(pgclirc_file) + self.syntax_style = c["main"]["syntax_style"] + self.cli_style = c["colors"] + self.style_output = style_factory_output(self.syntax_style, c["colors"]) +``` + +Explanation: + +* `get_config()` is called in `PGCli`'s constructor to load the configuration. +* Settings like `syntax_style` and colors are read from the configuration and used to initialize the `pgcli` interface. +* `style_factory_output()` creates an object that controls the color and styling of the output. + +## Conclusion + +Config Management allows you to customize `pgcli` to fit your preferences, from color schemes to keybindings and table formats. By editing the `config` file, you can tailor `pgcli` to your specific needs and create a more efficient and enjoyable command-line experience. + +In the next chapter, we'll explore [CompletionRefresher](06_completionrefresher.md), which updates the autocompletion suggestions in `pgcli` to reflect changes in your database schema. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/06_completionrefresher.md b/docs/pgcli/06_completionrefresher.md new file mode 100644 index 0000000..9fc8ec6 --- /dev/null +++ b/docs/pgcli/06_completionrefresher.md @@ -0,0 +1,175 @@ +# Chapter 6: CompletionRefresher + +In the previous chapter, [Config Management](05_config_management.md), we learned how `pgcli` remembers your favorite settings. Now, let's talk about keeping `pgcli`'s autocompletion suggestions up-to-date, even when your database changes! + +Imagine you add a new table called `products` to your PostgreSQL database. Wouldn't it be great if `pgcli` automatically started suggesting `products` when you type `SELECT * FROM pro`? That's where `CompletionRefresher` comes in! + +`CompletionRefresher` is like a librarian who constantly updates the catalog. It's the background process that keeps the [PGCompleter](04_pgcompleter.md) up-to-date with the latest database schema. It ensures that the auto-complete suggestions in `pgcli` are always accurate, even after you've made changes to your database structure. + +## What Problem Does CompletionRefresher Solve? + +Without `CompletionRefresher`, the [PGCompleter](04_pgcompleter.md) would only know about the database schema when `pgcli` first started. If you added a new table, function, or column, you wouldn't see it in the autocompletion suggestions until you restarted `pgcli`. This is annoying and inefficient! + +`CompletionRefresher` solves this by: + +1. **Staying Up-to-Date:** It automatically detects changes in your database schema and updates the [PGCompleter](04_pgcompleter.md) accordingly. +2. **Background Operation:** It works silently in the background, so you don't have to manually trigger the update. +3. **Accuracy:** Ensures that the auto-complete suggestions are always accurate, reflecting the current state of your database. + +## Key Concepts of CompletionRefresher + +`CompletionRefresher` relies on a few important concepts: + +1. **Background Thread:** `CompletionRefresher` runs in a separate thread, so it doesn't block the main `pgcli` interface. This means you can keep typing queries while the schema is being updated. +2. **Asynchronous Operation:** It retrieves database metadata asynchronously, meaning it doesn't wait for the entire operation to complete before returning control to the user. +3. **Metadata Refreshers:** Specific functions (decorated with `@refresher`) that know how to fetch different kinds of metadata from the database (tables, columns, functions, etc.). +4. **PGCompleter Update:** It updates the [PGCompleter](04_pgcompleter.md) with the new metadata, so the autocompletion suggestions are current. + +## Using CompletionRefresher: A Simple Example + +Let's say you've just created a new table called `products` in your PostgreSQL database. Here's how `CompletionRefresher` ensures that `pgcli` knows about it: + +**Step 1: Create the `products` table** + +Using `psql` or another tool, create a new table in your database: + +```sql +CREATE TABLE products ( + id SERIAL PRIMARY KEY, + name VARCHAR(255), + price DECIMAL +); +``` + +**Step 2: Wait for `CompletionRefresher` to do its thing (or manually trigger it)** + +`CompletionRefresher` runs automatically in the background (every few minutes - this is not configurable). Alternatively, you can manually trigger a refresh by typing `\refresh` in `pgcli` and pressing Enter. + +**Step 3: Autocomplete the new table name** + +Now, in `pgcli`, type: + +```sql +SELECT * FROM prod +``` + +If you hit `` after typing `prod`, you should see `products` as one of the autocompletion suggestions. This means that `CompletionRefresher` has successfully updated the [PGCompleter](04_pgcompleter.md) with the new table information. + +**Behind the Scenes:** + +When you create the `products` table (or when the `CompletionRefresher`'s timer expires), the `CompletionRefresher` connects to the database in the background, queries the database to fetch the list of tables, adds `products` to the [PGCompleter](04_pgcompleter.md)'s list of known tables, and allows the table to show up in autocompletions. + +## Diving Deeper: How CompletionRefresher Works Internally + +Let's explore what happens inside `CompletionRefresher` to make autocompletion updates happen. + +```mermaid +sequenceDiagram + participant User + participant PG as PGCli + participant CR as CompletionRefresher + participant PC as PGCompleter + participant DB as PostgreSQL + + User->>PG: Types "\refresh" + Enter (or background timer triggers) + PG->>CR: Starts CompletionRefresher + CR->>DB: Connects to PostgreSQL + loop For each metadata type (tables, columns, functions, ...) + CR->>CR: Executes metadata query + CR->>PC: Updates PGCompleter with the new metadata + end + PC->>PG: Updates completion suggestions + PG->>User: Presents pgcli with the updated completion suggestions +``` + +Here's a breakdown of the process: + +1. **User Trigger (or Background Timer):** You manually type `\refresh` in `pgcli` and press Enter, or the background timer triggers the `CompletionRefresher`. +2. **CompletionRefresher starts:** The [PGCli](02_pgexecute.md) class calls the `refresh` method of `CompletionRefresher`. +3. **Background Thread:** `CompletionRefresher` starts a new thread to perform the refresh operation without blocking the main `pgcli` interface. +4. **Metadata Refreshers:** The `_bg_refresh` function iterates through a dictionary of metadata refreshers. These refreshers are functions decorated with `@refresher` that know how to fetch specific types of metadata (tables, columns, functions, schemas, etc.) from the database. +5. **Updating PGCompleter:** Each refresher uses the [PGExecute](02_pgexecute.md) class to query the database and then updates the [PGCompleter](04_pgcompleter.md) with the new metadata. +6. **Autocompletion Updates:** [PGCompleter](04_pgcompleter.md) can use the new metadata to update the completion suggestions. + +Let's examine some simplified code snippets from `pgcli/completion_refresher.py`: + +```python +import threading + +class CompletionRefresher: + def __init__(self): + self._completer_thread = None + + def refresh(self, executor, special, callbacks, history=None, settings=None): + if self.is_refreshing(): + # Prevent overlapping refreshes + return + else: + self._completer_thread = threading.Thread( + target=self._bg_refresh, #Defines the target function + args=(executor, special, callbacks, history, settings), + name="completion_refresh", + ) + self._completer_thread.daemon = True + self._completer_thread.start() #Starts the thread + +``` + +This code snippet shows how `CompletionRefresher` starts a new thread to perform the refresh operation in the background. The `_bg_refresh` function (explained next) is the target function that will be executed in the new thread. + +```python + def _bg_refresh(self, pgexecute, special, callbacks, history=None, settings=None): + completer = PGCompleter( + smart_completion=True, pgspecial=special, settings=settings + ) + + executor = pgexecute.copy() #Create a copy of executor + for refresher in self.refreshers.values(): + refresher(completer, executor) #Call registered refreshers + + # Load history into pgcompleter so it can learn user preferences + n_recent = 100 + if history: + for recent in history.get_strings()[-n_recent:]: + completer.extend_query_history(recent, is_init=True) + + for callback in callbacks: + callback(completer) + executor.conn.close() + +``` + +Explanation: +* This is the function that is executed on the background thread +* `_bg_refresh` iterates through all the functions registered with `@refresher` decorator and executes each of them. These registered functions use the [PGExecute](02_pgexecute.md) to query the database and then update the [PGCompleter](04_pgcompleter.md) object with the new metadata. +* Lastly, we want the new completer to learn user preference from history. + +Here's an example of one of the refresher functions: + +```python +from .pgcompleter import PGCompleter + +@refresher("tables") +def refresh_tables(completer: PGCompleter, executor): + completer.extend_relations(executor.tables(), kind="tables") + completer.extend_columns(executor.table_columns(), kind="tables") + completer.extend_foreignkeys(executor.foreignkeys()) + +``` + +Explanation: + +* The `@refresher` decorator registers this function as a metadata refresher. +* `refresh_tables` uses the [PGExecute](02_pgexecute.md) object (`executor`) to fetch table metadata (table names and column names) from the database. +* It then updates the [PGCompleter](04_pgcompleter.md) object (`completer`) with the new table metadata using the `extend_relations` and `extend_columns` methods. + +## Conclusion + +`CompletionRefresher` ensures that `pgcli`'s autocompletion suggestions are always up-to-date, even when your database schema changes. By running in the background and using metadata refreshers, it provides a seamless and accurate autocompletion experience, making it easier and faster to write SQL queries. + +In the next chapter, we'll explore [MetaQuery](07_metaquery.md), which contains metadata information about the queries that have been executed. + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/07_metaquery.md b/docs/pgcli/07_metaquery.md new file mode 100644 index 0000000..3fc555f --- /dev/null +++ b/docs/pgcli/07_metaquery.md @@ -0,0 +1,164 @@ +# Chapter 7: MetaQuery + +In the previous chapter, [CompletionRefresher](06_completionrefresher.md), we learned how `pgcli` keeps its autocompletion suggestions up-to-date. Now, let's talk about how `pgcli` remembers what happened *after* you run a query. + +Imagine you just executed a complex SQL query. Wouldn't it be useful to know how long it took to run, whether it changed the database, or if it was a special command? That's where `MetaQuery` comes in! + +`MetaQuery` is like a "query receipt" that `pgcli` stores after each query is executed. It contains important information about the query, such as its execution time, success status, and whether it modified the database. This information is then used for things like logging, error handling, and improving autocompletion. + +## What Problem Does MetaQuery Solve? + +Without `MetaQuery`, `pgcli` would "forget" about your query as soon as it finished running. This would make it difficult to: + +1. **Track Query Performance:** You wouldn't know how long your queries are taking to execute. +2. **Identify Database Changes:** You wouldn't be able to easily see which queries modified the database. +3. **Improve Autocompletion:** `pgcli` wouldn't be able to use past query information to suggest better autocompletions. + +`MetaQuery` solves these problems by providing a way to store and access information about executed queries. + +## Key Concepts of MetaQuery + +`MetaQuery` is a simple, but powerful concept. Here are the key things to know: + +1. **Query Information:** `MetaQuery` stores various details about a query, including: + * The full text of the query. + * Whether the query was successful or not. + * The total time it took to run (including formatting the results). + * The time spent actually executing the query. + * Flags indicating if the query modified the database, changed the search path, or was a special command. + +2. **Named Tuple:** `MetaQuery` is implemented as a "named tuple" in Python. Think of it as a lightweight class that's used to store data. You can access the data using named attributes (e.g., `meta_query.query`, `meta_query.successful`). + +3. **History:** `MetaQuery` objects are stored in a history list, allowing `pgcli` to access information about past queries. + +## Using MetaQuery: A Simple Example + +Let's say you run the following SQL query in `pgcli`: + +```sql +SELECT * FROM users LIMIT 10; +``` + +After the query executes, `pgcli` creates a `MetaQuery` object to store information about it. Here's a simplified example of what that `MetaQuery` object might look like: + +```python +MetaQuery( + query="SELECT * FROM users LIMIT 10;", + successful=True, + total_time=0.123, + execution_time=0.087, + meta_changed=False, + db_changed=False, + path_changed=False, + mutated=False, + is_special=False, +) +``` + +* `query`: The original SQL query. +* `successful`: `True` because the query executed without errors. +* `total_time`: The query took 0.123 seconds to run and format results. +* `execution_time`: The query itself took 0.087 seconds to execute. +* `meta_changed`: `False` because the query didn't change any database objects (like tables or functions). +* `db_changed`: `False` because the query didn't switch databases. +* `path_changed`: `False` because the query didn't change the search path. +* `mutated`: `False` because the query didn't insert, update, or delete any data. +* `is_special`: `False` because it wasn't a special backslash command (like `\dt`). + +This `MetaQuery` object is then added to the query history. + +**How this information is used:** + +1. **Timing:** As we saw in the previous chapters, `pgcli` can display how long each query took to run. This information comes directly from the `total_time` and `execution_time` attributes of the `MetaQuery` object. +2. **Autocompletion:** `pgcli` can use the `MetaQuery` history to learn which tables and columns you frequently query. This information helps `PGCompleter` suggest more relevant autocompletions. +3. **Destructive Statement Requires Transaction:** Destructive warnings are generated only if destructive statements are outside a valid transaction. + +## Diving Deeper: How MetaQuery Works Internally + +Let's take a peek inside `pgcli` to see how `MetaQuery` is created and used. + +```mermaid +sequenceDiagram + participant User + participant PG as PGCli + participant PGEx as PGExecute + participant MQ as MetaQuery + + User->>PG: Enters SQL query + PG->>PGEx: Executes SQL query + PGEx->>PG: Returns results, status, and timings + PG->>MQ: Creates MetaQuery object with query metadata + PG->>PG: Appends MetaQuery to history + PG->>User: Displays results +``` + +Here's what happens step-by-step: + +1. **User Enters Query:** You type a SQL query in `pgcli` and press Enter. +2. **PGExecute Executes Query:** The [PGExecute](02_pgexecute.md) class executes the query against the database. +3. **Metadata Returned:** [PGExecute](02_pgexecute.md) returns the results of the query, the status message, and the execution time. +4. **MetaQuery Created:** The `PGCli` class creates a `MetaQuery` object, populating it with the query text, success status, timings, and other metadata. +5. **History Appended:** The `MetaQuery` object is appended to the `query_history` list in the `PGCli` class. +6. **Results Displayed:** The query results are displayed in your terminal. + +Now, let's look at the code in `pgcli/main.py` where `MetaQuery` is used: + +```python +from collections import namedtuple + +# Query tuples are used for maintaining history +MetaQuery = namedtuple( + "Query", + [ + "query", # The entire text of the command + "successful", # True If all subqueries were successful + "total_time", # Time elapsed executing the query and formatting results + "execution_time", # Time elapsed executing the query + "meta_changed", # True if any subquery executed create/alter/drop + "db_changed", # True if any subquery changed the database + "path_changed", # True if any subquery changed the search path + "mutated", # True if any subquery executed insert/update/delete + "is_special", # True if the query is a special command + ], +) +MetaQuery.__new__.__defaults__ = ("", False, 0, 0, False, False, False, False) +``` + +This code defines the `MetaQuery` named tuple and sets default values for its attributes. + +Here's where the `MetaQuery` object is created in the `_evaluate_command` function: + +```python + meta_query = MetaQuery( + text, + all_success, + total, + execution, + meta_changed, + db_changed, + path_changed, + mutated, + is_special, + ) + + return output, meta_query +``` + +This code shows how the `MetaQuery` object is created with the query text, success status, timings, and other metadata. + +Finally, here's the `handle_watch_command` method that adds the `MetaQuery` object to the history: + +```python + self.query_history.append(query) +``` + +## Conclusion + +`MetaQuery` provides a simple and effective way to store information about executed queries in `pgcli`. This information is used for various purposes, including displaying query timings, improving autocompletion, and handling errors. By capturing these details, `MetaQuery` enhances the overall `pgcli` experience. + +You've now explored all the core abstractions that power `pgcli`! There are no further chapters. You're well on your way to understanding how `pgcli` works under the hood! + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/pgcli/index.md b/docs/pgcli/index.md new file mode 100644 index 0000000..f4c352d --- /dev/null +++ b/docs/pgcli/index.md @@ -0,0 +1,41 @@ +# Tutorial: pgcli + +`pgcli` is an interactive command-line interface for PostgreSQL. It offers **auto-completion** of SQL queries and syntax highlighting, making database interaction *easier and more efficient*. It also provides special commands like `psql`. + + +**Source Repository:** [None](None) + +```mermaid +flowchart TD + A0["PGExecute"] + A1["PGCompleter"] + A2["CompletionRefresher"] + A3["Config Management"] + A4["PromptSession"] + A5["MetaQuery"] + A6["PGSpecial"] + A0 -- "Provides schema" --> A1 + A2 -- "Updates with schema" --> A1 + A4 -- "Uses for completion" --> A1 + A4 -- "Sends SQL" --> A0 + A4 -- "Applies settings" --> A3 + A0 -- "Generates query info" --> A5 + A4 -- "Executes commands" --> A6 + A0 -- "Supports special commands" --> A6 + A1 -- "Uses configuration" --> A3 +``` + +## Chapters + +1. [PromptSession](01_promptsession.md) +2. [PGExecute](02_pgexecute.md) +3. [PGSpecial](03_pgspecial.md) +4. [PGCompleter](04_pgcompleter.md) +5. [Config Management](05_config_management.md) +6. [CompletionRefresher](06_completionrefresher.md) +7. [MetaQuery](07_metaquery.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file