Skip to content

anthony-firn/CPU-FPGA-Tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Overview

  • Platform: Alibaba Cloud
  • FPGA Instance: f3 Instance Family with Xilinx Virtex UltraScale+ VU9P FPGA
  • Communication Interface: PCIe using Open-Source Xilinx Runtime (XRT) Drivers
  • FPGA Code: Verilog
  • CPU Code: C++
  • Driver Interface: Fully Open-Source (Xilinx Runtime - XRT)
  • Goal: Implement a CPU-FPGA application where data is sent from the CPU to the FPGA, processed on the FPGA, and sent back to the CPU.

Table of Contents

  1. Setting Up Your Alibaba Cloud Account
  2. Launching an FPGA Instance
  3. Preparing the Development Environment
  4. Understanding the Xilinx FPGA Framework
  5. Creating the FPGA Design (Verilog)
  6. Compiling the FPGA Design
  7. Programming the FPGA
  8. Creating the Host Application (C++)
  9. Compiling the Host Application
  10. Running the Host Application
  11. Verifying the Results
  12. Cleaning Up
  13. Conclusion
  14. References

1. Setting Up Your Alibaba Cloud Account

Steps:

  1. Visit the Alibaba Cloud Registration Page:

  2. Click on "Business Account" or "Individual Account":

    • Select account type and click next. You'll be redirected to a registration form.
  3. Fill Out the Registration Form:

    • Provide an email address, and set a password.
  4. Verify Your Mobile Number:

    • Enter your mobile phone number. You'll receive a verification code via email SMS. Enter it to proceed.
  5. Complete Account Information:

    • Provide personal details, billing information, and verify your identity if required.
  6. Select a Payment Method:

    • Add a valid payment method (credit card or PayPal) for resource usage billing.
  7. Accept Terms and Conditions:

    • Agree to Alibaba Cloud's terms of service and privacy policy.
  8. Submit the Form:

    • Your account should now be set up and ready to use.

2. Launching an FPGA Instance

Steps:

  1. Log In to the Alibaba Cloud Console:

  2. Navigate to Elastic Compute Service (ECS):

    • From the console dashboard, select Elastic Compute Service under Products & Services.
  3. Create an Instance:

    • Click on Instances in the left menu, then click Create Instance.
  4. Select Region and Zone:

    • Choose a region that supports FPGA instances (e.g., China East 1 or China North 2). Note that some regions may have restrictions.
  5. Select Instance Type:

    • Choose the f3 instance family, which includes FPGA capabilities.

    • For example, select f3.4xlarge which includes:

      • 16 vCPUs
      • 64 GB RAM
      • Xilinx Virtex UltraScale+ VU9P FPGA
  6. Select an Image (Operating System):

    • Choose an FPGA Development AMI provided by Alibaba Cloud.

    • If available, select an image like FPGA Developer AMI which includes necessary tools.

    • Alternatively, choose a standard Linux distribution (e.g., Ubuntu 18.04) and manually install the tools.

  7. Configure Storage:

    • Allocate sufficient storage for development tools and project files (e.g., 100 GB).
  8. Set Up Security Group (Firewall Rules):

    • Configure inbound rules to allow SSH access (port 22) from your IP address.
  9. Set Instance Details:

    • Assign a key pair for SSH access.

    • Configure instance name, tags, and other details as needed.

  10. Review and Launch:

    • Confirm the configuration and launch the instance.
  11. Wait for the Instance to Start:

    • It may take a few minutes for the instance to be ready.
  12. Note the Public IP Address:

    • You'll need this to SSH into the instance.

3. Preparing the Development Environment

Steps:

  1. SSH into the FPGA Instance:

    • Open a terminal on your local machine.

    • Connect to the instance using the key pair you assigned:

      ssh -i /path/to/your/private/key.pem ubuntu@your_instance_public_ip
  2. Update the System Packages:

    sudo apt update && sudo apt upgrade -y
  3. Install Required Dependencies:

    • Install essential build tools and libraries:

      sudo apt install -y build-essential git wget libssl-dev
  4. Install Xilinx Tools:

    • Option 1: Using Pre-installed Tools

      • If you selected an FPGA Developer AMI, the Xilinx tools might already be installed.

      • Verify by checking for Vivado and XRT:

        vivado -version
        xbutil --version
    • Option 2: Manual Installation

      • Download Vivado and XRT:

      • Install Vivado:

        • Transfer the installer to the instance (use scp or wget if available).

        • Run the installer:

          sudo ./Xilinx_Vivado_SDK_Web_2020.1_0602_1208_Lin64.bin
        • Follow the on-screen instructions to install Vivado WebPACK Edition.

      • Install XRT:

        • Install required dependencies:

          sudo apt install -y libboost-all-dev
        • Download the appropriate .deb package for Ubuntu 18.04.

        • Install XRT:

          sudo dpkg -i xrt_2020.1.8.621_18.04-amd64-xrt.deb
      • Set Up Environment Variables:

        echo "source /opt/Xilinx/Vivado/2020.1/settings64.sh" >> ~/.bashrc
        source ~/.bashrc
  5. Verify Installation:

    • Check that Vivado and XRT are installed:

      vivado -version
      xbutil --version
  6. Clone Xilinx Runtime (XRT) Source Code (Optional):

    • If you need to build XRT from source for the latest updates or custom modifications:

      git clone https://github.com/Xilinx/XRT.git
      cd XRT
    • Follow the build instructions in the repository.

  7. Install OpenCL Headers (Optional):

    • If your application uses OpenCL:

      sudo apt install -y opencl-headers ocl-icd-opencl-dev

4. Understanding the Xilinx FPGA Framework

The Xilinx FPGA framework allows communication between the host CPU and FPGA over PCIe using the Xilinx Runtime (XRT). XRT is an open-source driver and runtime library that provides a standardized API for FPGA applications.

  • Key Components:

    • XRT (Xilinx Runtime): Open-source driver and runtime library for communication between CPU and FPGA.
    • FPGA Shell (Platform): The base FPGA image that provides PCIe connectivity and standard interfaces.
    • FPGA User Logic (Kernel): Your custom logic implemented in Verilog.
    • Host Application: Software application running on the CPU that communicates with the FPGA via XRT APIs.
  • Development Flow:

    1. Create FPGA Design:
      • Develop the FPGA kernel in Verilog.
      • Use AXI interfaces for communication.
    2. Create Host Application:
      • Use XRT APIs to communicate with the FPGA over PCIe.
    3. Compile FPGA Design:
      • Use Vivado to compile the FPGA design and generate a bitstream (.xclbin).
    4. Program the FPGA:
      • Use XRT tools to program the FPGA with the bitstream.
    5. Run Host Application:
      • Execute the host application to send and receive data.

5. Creating the FPGA Design (Verilog)

We'll create a simple Verilog kernel that increments an input value and sends it back to the host.

Steps:

  1. Set Up the Directory Structure:

    mkdir -p ~/fpga_project/kernel
    cd ~/fpga_project
  2. Create the Verilog Kernel Code:

    • Navigate to the kernel directory:

      cd kernel
    • Create a file named increment_kernel.v:

      nano increment_kernel.v
    • Add the following Verilog code:

      module increment_kernel (
          input wire ap_clk,
          input wire ap_rst_n,
          input wire [31:0] s_axi_control_AWADDR,
          input wire s_axi_control_AWVALID,
          output wire s_axi_control_AWREADY,
          input wire [31:0] s_axi_control_WDATA,
          input wire [3:0] s_axi_control_WSTRB,
          input wire s_axi_control_WVALID,
          output wire s_axi_control_WREADY,
          output wire [1:0] s_axi_control_BRESP,
          output wire s_axi_control_BVALID,
          input wire s_axi_control_BREADY,
          input wire [31:0] s_axi_control_ARADDR,
          input wire s_axi_control_ARVALID,
          output wire s_axi_control_ARREADY,
          output wire [31:0] s_axi_control_RDATA,
          output wire [1:0] s_axi_control_RRESP,
          output wire s_axi_control_RVALID,
          input wire s_axi_control_RREADY,
          output wire interrupt,
      
          // AXI4 Master Interface
          output wire [31:0] m_axi_gmem_AWADDR,
          output wire [7:0] m_axi_gmem_AWLEN,
          output wire [2:0] m_axi_gmem_AWSIZE,
          output wire [1:0] m_axi_gmem_AWBURST,
          output wire m_axi_gmem_AWLOCK,
          output wire [3:0] m_axi_gmem_AWCACHE,
          output wire [2:0] m_axi_gmem_AWPROT,
          output wire [3:0] m_axi_gmem_AWQOS,
          output wire m_axi_gmem_AWVALID,
          input wire m_axi_gmem_AWREADY,
          output wire [511:0] m_axi_gmem_WDATA,
          output wire [63:0] m_axi_gmem_WSTRB,
          output wire m_axi_gmem_WLAST,
          output wire m_axi_gmem_WVALID,
          input wire m_axi_gmem_WREADY,
          input wire [1:0] m_axi_gmem_BRESP,
          input wire m_axi_gmem_BVALID,
          output wire m_axi_gmem_BREADY,
          output wire [31:0] m_axi_gmem_ARADDR,
          output wire [7:0] m_axi_gmem_ARLEN,
          output wire [2:0] m_axi_gmem_ARSIZE,
          output wire [1:0] m_axi_gmem_ARBURST,
          output wire m_axi_gmem_ARLOCK,
          output wire [3:0] m_axi_gmem_ARCACHE,
          output wire [2:0] m_axi_gmem_ARPROT,
          output wire [3:0] m_axi_gmem_ARQOS,
          output wire m_axi_gmem_ARVALID,
          input wire m_axi_gmem_ARREADY,
          input wire [511:0] m_axi_gmem_RDATA,
          input wire [1:0] m_axi_gmem_RRESP,
          input wire m_axi_gmem_RLAST,
          input wire m_axi_gmem_RVALID,
          output wire m_axi_gmem_RREADY
      );
      
      // Your logic goes here
      
      // For simplicity, this example does not implement full AXI transactions.
      // Implement the necessary AXI4 protocol to read and write data from host memory.
      
      // This is a placeholder to show where the increment operation would be implemented.
      
      endmodule
    • Note: Implementing full AXI4 master interfaces requires significant code to handle all protocol signals. For simplicity, we'll use the High-Level Synthesis (HLS) tool to generate the AXI interfaces.

  3. Use HLS to Simplify Kernel Development (Optional):

    • If you're comfortable with HLS, you can write the kernel in C/C++ and let HLS generate the Verilog with AXI interfaces.

    • Here's an example of an HLS kernel:

      #include <ap_int.h>
      extern "C" {
      void increment_kernel(ap_uint<512>* in, ap_uint<512>* out, int size) {
          for (int i = 0; i < size; i++) {
              out[i] = in[i] + 1;
          }
      }
      }
    • Save this code as increment_kernel.cpp in the kernel directory.

  4. Create a Kernel Description File (Kernel Definition):

    • Create a file named kernel.xml:

      <?xml version="1.0" encoding="UTF-8"?>
      <root version="1.0" xilinx_version="2020.1">
        <kernel name="increment_kernel" language="ip_c" vlnv="xilinx.com:hls:increment_kernel:1.0" attributes="" preferredWorkGroupSizeMultiple="0" workGroupSize="0,0,0" runtime="OpenCL">
          <ports>
            <port name="in" mode="read_only" range="0xFFFFFFFFFFFFFFFF" port="m_axi_gmem" arg_index="0" host_offset="0" size="0x0"/>
            <port name="out" mode="write_only" range="0xFFFFFFFFFFFFFFFF" port="m_axi_gmem" arg_index="1" host_offset="0" size="0x0"/>
            <port name="size" mode="read_only" range="0xFFFFFFFFFFFFFFFF" port="" arg_index="2" host_offset="0" size="0x0"/>
          </ports>
          <args>
            <arg name="in" addressQualifier="1" id="0" port="in" size="8" offset="0x10"/>
            <arg name="out" addressQualifier="2" id="1" port="out" size="8" offset="0x18"/>
            <arg name="size" addressQualifier="0" id="2" port="" size="4" offset="0x20"/>
          </args>
        </kernel>
      </root>
  5. Create a Makefile for the Kernel:

    • Create a file named Makefile in the fpga_project directory:

      TARGET=hw
      DEVICE=xilinx_u200_xdma_201830_2
      
      all: build
      
      build:
          v++ -c -t $(TARGET) --platform $(DEVICE) -k increment_kernel -o kernel.xo kernel/increment_kernel.cpp
          v++ -l -t $(TARGET) --platform $(DEVICE) -o binary_container.xclbin kernel.xo
      
      clean:
          rm -f kernel.xo binary_container.xclbin
    • Note: Replace DEVICE with the actual platform name of your FPGA instance (e.g., as reported by xbutil scan).

  6. Explanation:

    • We use Vitis (v++) to compile the kernel.

    • The kernel code is written in HLS C++, which is easier for creating AXI interfaces.

    • The v++ compiler will generate the necessary RTL and interfaces.


6. Compiling the FPGA Design

Steps:

  1. Set Up the Environment Variables:

    • Ensure that Vitis and XRT are properly set up:

      source /tools/Xilinx/Vitis/2020.1/settings64.sh
      source /opt/xilinx/xrt/setup.sh
    • Note: Adjust the paths according to your installation directories.

  2. Build the Kernel:

    • From the fpga_project directory, run:

      make
    • This will:

      • Compile the kernel code to an object file (.xo).

      • Link the object file to create an FPGA binary (.xclbin).

    • Note: The build process may take some time.

  3. Verify the Generated Files:

    • After the build completes, you should have:

      • kernel.xo - Kernel object file.

      • binary_container.xclbin - FPGA binary file.


7. Programming the FPGA

Steps:

  1. Check Available FPGA Devices:

    • Use xbutil to list FPGA devices:

      sudo xbutil scan
    • You should see information about the FPGA device, including its platform name.

  2. Program the FPGA:

    • Use xbutil to program the FPGA with the generated .xclbin file:

      sudo xbutil program -d 0 -p binary_container.xclbin
    • Explanation:

      • -d 0 specifies the device ID (use 0 if you have only one FPGA).

      • -p specifies the path to the FPGA binary.

  3. Verify Programming:

    • Check the FPGA status:

      sudo xbutil validate -d 0
    • Ensure that the FPGA is programmed correctly and passes validation tests.


8. Creating the Host Application (C++)

We'll create a host application that uses XRT APIs to communicate with the FPGA.

Steps:

  1. Create the Host Application Directory:

    mkdir ~/fpga_project/host
    cd ~/fpga_project/host
  2. Write the Host Application Code:

    • Create a file named host.cpp:

      nano host.cpp
    • Add the following code:

      #include <xrt/xrt.h>
      #include <xrt/xrt_kernel.h>
      #include <xrt/xrt_bo.h>
      #include <iostream>
      #include <fstream>
      #include <vector>
      
      int main(int argc, char** argv) {
          // Load the FPGA binary
          std::string binaryFile = "../binary_container.xclbin";
          if (argc == 2) {
              binaryFile = argv[1];
          }
      
          // Open the device
          auto device = xrt::device(0);
      
          // Load the xclbin
          std::ifstream bin_file(binaryFile, std::ifstream::binary);
          bin_file.seekg(0, bin_file.end);
          size_t size = bin_file.tellg();
          bin_file.seekg(0, bin_file.beg);
          std::vector<char> buffer(size);
          bin_file.read(buffer.data(), size);
          auto uuid = device.load_xclbin(buffer);
      
          // Open the kernel
          auto krnl = xrt::kernel(device, uuid, "increment_kernel");
      
          // Allocate buffer on FPGA
          size_t vector_size = 1024;
          size_t vector_size_in_bytes = vector_size * sizeof(uint64_t);
      
          auto in_bo = xrt::bo(device, vector_size_in_bytes, krnl.group_id(0));
          auto out_bo = xrt::bo(device, vector_size_in_bytes, krnl.group_id(1));
      
          // Map the buffers
          auto in_ptr = in_bo.map<uint64_t*>();
          auto out_ptr = out_bo.map<uint64_t*>();
      
          // Initialize input data
          for (size_t i = 0; i < vector_size; i++) {
              in_ptr[i] = i;
          }
      
          // Synchronize input buffer data to device global memory
          in_bo.sync(XCL_BO_SYNC_BO_TO_DEVICE);
      
          // Run the kernel
          auto run = krnl(in_bo, out_bo, vector_size);
          run.wait();
      
          // Synchronize output buffer data from device global memory
          out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
      
          // Verify the results
          int match = 0;
          for (size_t i = 0; i < vector_size; i++) {
              uint64_t expected = in_ptr[i] + 1;
              if (out_ptr[i] != expected) {
                  std::cout << "Error at index " << i << ": expected " << expected << ", got " << out_ptr[i] << std::endl;
                  match = 1;
                  break;
              }
          }
      
          if (match == 0) {
              std::cout << "SUCCESS: FPGA incremented the data correctly." << std::endl;
          } else {
              std::cout << "ERROR: FPGA did not increment the data correctly." << std::endl;
          }
      
          return match;
      }
    • Explanation:

      • Loads the FPGA binary.

      • Allocates buffers for input and output data.

      • Initializes input data.

      • Runs the kernel on the FPGA.

      • Reads back and verifies the output data.

  3. Create a Makefile for the Host Application:

    • Create a file named Makefile in the host directory:

      CXX= g++
      CXXFLAGS= -Wall -O0 -g -std=c++11
      LDFLAGS= -L/opt/xilinx/xrt/lib -lxrt_coreutil -pthread
      
      HOST_EXE=host_app
      
      all: $(HOST_EXE)
      
      $(HOST_EXE): host.cpp
          $(CXX) $(CXXFLAGS) -I/opt/xilinx/xrt/include $^ -o $@ $(LDFLAGS)
      
      clean:
          rm -f $(HOST_EXE)

9. Compiling the Host Application

Steps:

  1. Set Up the Build Environment:

    • Ensure that XRT is properly set up:

      source /opt/xilinx/xrt/setup.sh
  2. Compile the Host Application:

    • From the host directory, run:

      make
    • This will compile host.cpp into host_app.


10. Running the Host Application

Steps:

  1. Execute the Host Application:

    • From the host directory, run:

      ./host_app
    • Or specify the path to the FPGA binary if necessary:

      ./host_app ../binary_container.xclbin
  2. Expected Output:

    SUCCESS: FPGA incremented the data correctly.
    
    • The application sends a vector of data to the FPGA.

    • The FPGA increments each element by 1.

    • The host application reads back the data and verifies the result.


11. Verifying the Results

Steps:

  1. Check the Host Application Output:

    • Ensure that the output indicates success.
  2. Debugging If Needed:

    • If the output shows an error:

      • Check that the FPGA is programmed with the correct bitstream.

      • Ensure that the device ID used in programming and in the host code matches.

      • Verify that the kernel name in the host code matches the kernel name used in the FPGA design.

      • Check for any error messages during compilation or execution.

  3. Monitor FPGA Status:

    • Use xbutil to check FPGA status:

      sudo xbutil query -d 0
    • Look for any reported errors.


12. Cleaning Up

Steps:

  1. Release FPGA Resources:

    • Ensure that the host application has terminated and released all resources.
  2. Clean Up Build Files:

    • From the fpga_project directory, run:

      make clean
      cd host
      make clean
  3. Terminate the FPGA Instance (If No Longer Needed):

    • From the Alibaba Cloud console, stop or terminate the instance to avoid incurring charges.

13. Conclusion

By following these detailed steps, you've:

  • Set up an account and launched an FPGA instance on Alibaba Cloud.
  • Prepared the development environment for FPGA programming with Xilinx tools and open-source XRT drivers.
  • Created a Verilog-based FPGA kernel (using HLS for simplicity) that processes data sent from the CPU.
  • Compiled the FPGA design and programmed the FPGA with it.
  • Written a C++ host application that uses open-source XRT drivers to communicate with the FPGA.
  • Executed the host application, achieving direct, secure communication between the CPU and FPGA over PCIe.
  • Verified the correct operation of the system.

This setup provides a real-world example of CPU-FPGA communication in a cloud environment using Verilog and open-source drivers, satisfying your requirements.


14. References


Important Notes:

  • Permissions and Privileges:

    • Programming the FPGA may require sudo privileges.
    • Ensure you have the necessary permissions on the FPGA instance.
  • Resource Usage:

    • FPGA compilation can be resource-intensive and time-consuming.
    • Be mindful of instance usage and billing on Alibaba Cloud.
  • Security Considerations:

    • The communication between CPU and FPGA over PCIe is considered secure within the cloud environment.
    • XRT provides mechanisms for secure communication, but always ensure your code handles data securely.
  • Driver Interface:

    • XRT is fully open-source, satisfying your requirement for open-source drivers.
    • The XRT SDK is licensed under the Apache License 2.0.
  • Alternative Options:

    • If Alibaba Cloud does not meet your needs due to restrictions, consider other cloud providers like AWS EC2 F1 instances or local FPGA development boards.

About

Tutorial for CPU-FPGA communication using Xilinx FPGAs on Alibaba Cloud with PCIe and XRT.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published