Merge pull request #244 from deepset-ai/add-llama-stack-cookbook

Amnah199 · web-flow · commit f9749807d46b · 2025-07-21T11:46:19.000+02:00
Add a simple notebook for llama stack
diff --git a/index.toml b/index.toml
@@ -71,6 +71,11 @@ title = "Build with Gemma and Haystack"
 notebook = "gemma_chat_rag.ipynb"
 topics = ["RAG"]
 
+[[cookbook]]
+title = "Build with Llama Stack and Haystack Agent"
+notebook = "llama_stack_with_agent.ipynb"
+topics = ["Function Calling", "Agents"]
+
 [[cookbook]]
 title = "Hacker News Summaries with Custom Components"
 notebook = "hackernews-custom-component-rag.ipynb"
diff --git a/notebooks/llama_stack_with_agent.ipynb b/notebooks/llama_stack_with_agent.ipynb
@@ -0,0 +1,225 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# 🛠️🦙 Build with Llama Stack and Haystack Agent\n",
+        "\n",
+        "\n",
+        "This notebook demonstrates how to use the `LlamaStackChatGenerator` component with Haystack `Agent` to enable function calling capabilities. We'll create a simple weather tool that the `Agent` can call to provide dynamic, up-to-date information.\n",
+        "\n",
+        "We start with installing integration package."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "%%bash\n",
+        "\n",
+        "pip install llama-stack-haystack"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Setup\n",
+        "\n",
+        "Before running this example, you need to:\n",
+        "\n",
+        "1. Set up Llama Stack Server through an inference provider\n",
+        "2. Have a model available (e.g., `llama3.2:3b`)\n",
+        "\n",
+        "For a quick start on how to setup server with Ollama, see the [Llama Stack documentation](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).\n",
+        "\n",
+        "Once you have the server running, it will typically be available at `http://localhost:8321/v1/openai/v1`.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Defining a Tool\n",
+        "\n",
+        "Tools in Haystack allow models to call functions to get real-time information or perform actions. Let's create a simple weather tool that the model can use to provide weather information.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from haystack.dataclasses import ChatMessage\n",
+        "from haystack.tools import Tool\n",
+        "\n",
+        "# Define a tool that models can call\n",
+        "def weather(city: str):\n",
+        "    \"\"\"Return mock weather info for the given city.\"\"\"\n",
+        "    return f\"The weather in {city} is sunny and 32°C\"\n",
+        "\n",
+        "# Define the tool parameters schema\n",
+        "tool_parameters = {\n",
+        "    \"type\": \"object\", \n",
+        "    \"properties\": {\n",
+        "        \"city\": {\"type\": \"string\"}\n",
+        "    }, \n",
+        "    \"required\": [\"city\"]\n",
+        "}\n",
+        "\n",
+        "# Create the weather tool\n",
+        "weather_tool = Tool(\n",
+        "    name=\"weather\",\n",
+        "    description=\"Useful for getting the weather in a specific city\",\n",
+        "    parameters=tool_parameters,\n",
+        "    function=weather,\n",
+        ")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Setting Up Agent\n",
+        "\n",
+        "Now let's create a `LlamaStackChatGenerator` and pass it to the `Agent`.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from haystack.components.agents import Agent\n",
+        "from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator\n",
+        "from haystack.components.generators.utils import print_streaming_chunk\n",
+        "\n",
+        "# Create the LlamaStackChatGenerator\n",
+        "chat_generator = LlamaStackChatGenerator(\n",
+        "    model=\"ollama/llama3.2:3b\",  # model name varies depending on the inference provider used for the Llama Stack Server\n",
+        "    api_base_url=\"http://localhost:8321/v1/openai/v1\",\n",
+        ")\n",
+        "# Agent Setup\n",
+        "agent = Agent(\n",
+        "    chat_generator=chat_generator,\n",
+        "    tools=[weather_tool],\n",
+        ")\n",
+        "\n",
+        "# Run the Agent\n",
+        "agent.warm_up()\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Using Tools with the Agent\n",
+        "\n",
+        "Now, when we ask questions, the `Agent` will utilize both the provided `tool` and the `LlamaStackChatGenerator` to generate answers. We enable the streaming in Agent, so that you can observe the tool calls and the tool results in real time.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "[TOOL CALL]\n",
+            "Tool: weather \n",
+            "Arguments: {\"city\":\"Tokyo\"}\n",
+            "\n",
+            "[TOOL RESULT]\n",
+            "The weather in Tokyo is sunny and 32°C\n",
+            "\n",
+            "In[ASSISTANT]\n",
+            " Tokyo, the current weather conditions are mostly sunny with a temperature of 32°C. Would you like to know more about Tokyo's climate or weather forecast for a specific date?\n",
+            "\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Create a message asking about the weather\n",
+        "messages = [ChatMessage.from_user(\"What's the weather in Tokyo?\")]\n",
+        "\n",
+        "# Generate a response from the model with access to tools\n",
+        "response = agent.run(messages=messages, tools=[weather_tool],     streaming_callback=print_streaming_chunk,\n",
+        ")\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Simple Chat with ChatGenerator\n",
+        "For a simpler use case, you can also create a lightweight mechanism to chat directly with `LlamaStackChatGenerator`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 15,
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "🤖 The main character in The Witcher series, also known as the eponymous figure, is Geralt of Rivia, a monster hunter with supernatural abilities and mutations that allow him to control the elements. He was created by Polish author_and_polish_video_game_development_company_(CD Projekt).\n",
+            "🤖 One of the most fascinating aspects of dolphin behavior is their ability to produce complex, context-dependent vocalizations that are unique to each individual, similar to human language. They also exhibit advanced social behaviors, such as cooperation, empathy, and self-awareness.\n"
+          ]
+        }
+      ],
+      "source": [
+        "messages = []\n",
+        "\n",
+        "while True:\n",
+        "  msg = input(\"Enter your message or Q to exit\\n🧑 \")\n",
+        "  if msg==\"Q\":\n",
+        "    break\n",
+        "  messages.append(ChatMessage.from_user(msg))\n",
+        "  response = chat_generator.run(messages=messages)\n",
+        "  assistant_resp = response['replies'][0]\n",
+        "  print(\"🤖 \"+assistant_resp.text)\n",
+        "  messages.append(assistant_resp)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "If you want to switch your model provider, you can reuse the same `LlamaStackChatGenerator` code with different providers. Simply run the desired inference provider on the Llama Stack Server and update the model name during the initialization of `LlamaStackChatGenerator`.\n",
+        "\n",
+        "For more details on available inference providers, see (Llama Stack docs)[https://llama-stack.readthedocs.io/en/latest/providers/inference/index.html]."
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": ".venv",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.13.5"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}