intel · praveenkk123 · Sep 26, 2024 · Sep 26, 2024 · Sep 26, 2024 · Sep 26, 2024
diff --git a/ipex_llm_gpu.ipynb → 1_native_gpu.ipynb b/ipex_llm_gpu.ipynb → 1_native_gpu.ipynb
@@ -178,7 +178,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.5"
+   "version": "3.11.9"
   }
  },
  "nbformat": 4,

diff --git a/ipex_llm_ollama_gpu.ipynb → 2_ollama_gpu.ipynb b/ipex_llm_ollama_gpu.ipynb → 2_ollama_gpu.ipynb
diff --git a/ipex_llm_pytorch_gpu.ipynb → 3_llm_pytorch_gpu.ipynb b/ipex_llm_pytorch_gpu.ipynb → 3_llm_pytorch_gpu.ipynb
diff --git a/llm-rag.ipynb → 4_llm-rag.ipynb b/llm-rag.ipynb → 4_llm-rag.ipynb
diff --git a/5_llm_quantization_sycl.ipynb b/5_llm_quantization_sycl.ipynb
diff --git a/6_llm_sycl_gpu.ipynb b/6_llm_sycl_gpu.ipynb
@@ -0,0 +1,248 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "652ea6c8-8d13-4228-853e-fad46db470f5",
+   "metadata": {},
+   "source": [
+    "# Inference using SYCL backend on AI PC"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71e0aeac-58b1-4114-95f1-7d3a7a4c34f2",
+   "metadata": {},
+   "source": [
+    "## Introduction\n",
+    "\n",
+    "This notebook demonstrates how to install LLamacpp for SYCL on Windows with Intel GPUs. It applies to Intel Core Ultra and Core 11 - 14 gen integrated GPUs (iGPUs), as well as Intel Arc Series GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97cf7db8-9529-47dd-b41d-81b22c8d5848",
+   "metadata": {},
+   "source": [
+    "## What is an AIPC\n",
+    "\n",
+    "What is an AI PC you ask?\n",
+    "\n",
+    "Here is an [explanation](https://www.intel.com/content/www/us/en/newsroom/news/what-is-an-ai-pc.htm#gs.a55so1) from Intel:\n",
+    "\n",
+    "”An AI PC has a CPU, a GPU and an NPU, each with specific AI acceleration capabilities. An NPU, or neural processing unit, is a specialized accelerator that handles artificial intelligence (AI) and machine learning (ML) tasks right on your PC instead of sending data to be processed in the cloud. The GPU and CPU can also process these workloads, but the NPU is especially good at low-power AI calculations. The AI PC represents a fundamental shift in how our computers operate. It is not a solution for a problem that didn’t exist before. Instead, it promises to be a huge improvement for everyday PC usages.”"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4682eb3e-540b-4814-8142-c54efc32f31b",
+   "metadata": {},
+   "source": [
+    "## Install Prerequisites"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37f8b6d2-34af-44ad-8363-dea57660bc00",
+   "metadata": {},
+   "source": [
+    "### Step 1: System Preparation\n",
+    "\n",
+    "To set up your AIPC for running with Intel iGPUs, follow these essential steps:\n",
+    "\n",
+    "1. Update Intel GPU Drivers: Ensure your system has the latest Intel GPU drivers, which are crucial for optimal performance and compatibility. You can download these directly from Intel's [official website](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) . Once you have installed the official drivers, you could also install Intel ARC Control to monitor the gpu:\n",
+    "\n",
+    "   <img src=\"Assets/gpu_arc_control.png\">\n",
+    "\n",
+    "\n",
+    "2. Install Visual Studio 2022 Community edition with C++: Visual Studio 2022, along with the “Desktop Development with C++” workload, is required. This prepares your environment for C++ based extensions used by the intel SYCL backend that powers accelerated Ollama. You can download VS 2022 Community edition from the official site, [here](https://visualstudio.microsoft.com/downloads/).\n",
+    "\n",
+    "3. Install conda-forge: conda-forge will manage your Python environments and dependencies efficiently, providing a clean, minimal base for your Python setup. Visit conda-forge's [installation site](https://conda-forge.org/download/) to install for windows.\n",
+    "\n",
+    "4. Install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html)\n",
+    "\n",
+    "   "
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "8040fd21-7782-4b97-a0eb-327816328f17",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "## Step 2: Install Llamacpp for SYCL\n",
+    "The llama.cpp SYCL backend is designed to support Intel GPU firstly. Based on the cross-platform feature of SYCL.\n",
+    "\n",
+    "### After installation of conda-forge, open the Miniforge Prompt, and create a new python environment:\n",
+    "  ```\n",
+    "  conda create -n llm-sycl python=3.11\n",
+    "\n",
+    "  ```\n",
+    "\n",
+    "### Activate the new environment\n",
+    "```\n",
+    "conda activate llm-sycl\n",
+    "\n",
+    "```\n",
+    "\n",
+    "<img src=\"Assets/llm4.png\">\n",
+    "\n",
+    "### With the llm-sycl environment active, enable oneAPI environment. \n",
+    "Type oneapi in the windows search and then open the Intel oneAPI command prompt for Intel 64 for Visual Studio 2022 App.\n",
+    "\n",
+    "<img src=\"Assets/oneapi1.png\">\n",
+    "\n",
+    "#### Run the below command in the VS command prompt and you should see the below sycl devices displayed in the console\n",
+    "There should be one or more level-zero GPU devices displayed as ext_oneapi_level_zero:gpu.\n",
+    "\n",
+    "```\n",
+    "sycl-ls\n",
+    "\n",
+    "```\n",
+    "\n",
+    "<img src=\"Assets/oneapi2.png\">\n",
+    "\n",
+    "### Install build tools\n",
+    "\n",
+    "* Download & install [cmake for Windows](https://cmake.org/download/):\n",
+    "* The new Visual Studio will install Ninja as default. (If not, please install it manually: https://ninja-build.org/)\n",
+    "\n",
+    "### Install llama.cpp\n",
+    "\n",
+    "* git clone the llama.cpp repo\n",
+    "  \n",
+    "  ```\n",
+    "  git clone https://github.com/ggerganov/llama.cpp.git\n",
+    "\n",
+    "  ```\n",
+    "  \n",
+    "* On the oneAPI command line window, step into the llama.cpp main directory and run the following:\n",
+    "  \n",
+    "  ```\n",
+    "  @call \"C:\\Program Files (x86)\\Intel\\oneAPI\\setvars.bat\" intel64 --force\n",
+    "\n",
+    "    # Option 1: Use FP32 (recommended for better performance in most cases)\n",
+    "  cmake -B build -G \"Ninja\" -DGGML_SYCL=ON -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=icx  -DCMAKE_BUILD_TYPE=Release\n",
+    "    \n",
+    "    # Option 2: Or FP16\n",
+    "  cmake -B build -G \"Ninja\" -DGGML_SYCL=ON -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=icx  -DCMAKE_BUILD_TYPE=Release -DGGML_SYCL_F16=ON\n",
+    "    \n",
+    "    cmake --build build --config Release -j\n",
+    "\n",
+    "  ```\n",
+    "\n",
+    "### Below shows a simple example to show how to run a community GGUF model with llama.cpp for SYCL\n",
+    "* Download the model from huggingface and prepare the model for inference\n",
+    "* Run the model for example as below\n",
+    "* Open the mini-forge prompt, activate the llm-sycl environment and enable oneAPI enviroment as below\n",
+    "\n",
+    "  ```\n",
+    "  \"C:\\Program Files (x86)\\Intel\\oneAPI\\setvars.bat\" intel64  \n",
+    "  ```\n",
+    "* List the sycl devices as below\n",
+    "\n",
+    "  ```\n",
+    "  build\\bin\\ls-sycl-device.exe\n",
+    "\n",
+    "  ```\n",
+    "* Run inference\n",
+    "```\n",
+    "build\\bin\\llama-cli.exe -m models\\llama-2-7b.Q4_0.gguf -p \"Building a website can be done in 10 simple steps:\\nStep 1:\" -n 400 -e -ngl 33 -s 0 -sm none -mg 0\n",
+    "```\n",
+    "\n",
+    "<img src=\"Assets/cmd1.png\">\n",
+    "\n",
+    "### Below is an example output\n",
+    "\n",
+    "<img src=\"Assets/out1.png\">\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01173f7a-0725-4b34-aabc-7e6582b87da4",
+   "metadata": {},
+   "source": [
+    "## Run the inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3ad64c2-2432-4cb0-8a3d-856daad1dc18",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! ..\\git_llamacpp\\llama.cpp\\build\\bin\\llama-cli.exe -m Qwen1.5-4B.Q4_K_M.gguf -p \"Building a website can be done in 10 simple steps:\\nStep 1:\" -n 400 -e -ngl 25 -s 0 -sm none -mg 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd47d1d9-414b-4cd2-b20e-4c36871f1145",
+   "metadata": {},
+   "source": [
+    "#### Run the inference locally on AI PC"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65e5cd95-18a4-4879-9d3d-05e302448ff6",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! ..\\git_llamacpp\\llama.cpp\\build\\bin\\llama-cli.exe -m ./download_models/{model_gguf} -p \"Building a website can be done in 10 simple steps:\\nStep 1:\" -n 100 -e -ngl 33 -s 0 -sm none -mg 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec180ac3-e74a-41d9-a9b9-65478dcea556",
+   "metadata": {},
+   "source": [
+    "## Example output\n",
+    "\n",
+    "<img src=\"Assets/output_latest.png\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92387fa9-2376-49a7-a94b-a29f254a0471",
+   "metadata": {},
+   "source": [
+    "* Reference:https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8ac73234-1851-42ad-9b6c-67ba9562db32",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}