GitHub - anthony-firn/CPU-FPGA-Tutorial: Tutorial for CPU-FPGA communication using Xilinx FPGAs on Alibaba Cloud with PCIe and XRT.

Overview

Platform: Alibaba Cloud
FPGA Instance: f3 Instance Family with Xilinx Virtex UltraScale+ VU9P FPGA
Communication Interface: PCIe using Open-Source Xilinx Runtime (XRT) Drivers
FPGA Code: Verilog
CPU Code: C++
Driver Interface: Fully Open-Source (Xilinx Runtime - XRT)
Goal: Implement a CPU-FPGA application where data is sent from the CPU to the FPGA, processed on the FPGA, and sent back to the CPU.

Setting Up Your Alibaba Cloud Account
Launching an FPGA Instance
Preparing the Development Environment
Understanding the Xilinx FPGA Framework
Creating the FPGA Design (Verilog)
Compiling the FPGA Design
Programming the FPGA
Creating the Host Application (C++)
Compiling the Host Application
Running the Host Application
Verifying the Results
Cleaning Up
Conclusion
References

1. Setting Up Your Alibaba Cloud Account

Steps:

Visit the Alibaba Cloud Registration Page:
- Go to Alibaba Cloud Registration.
Click on "Business Account" or "Individual Account":
- Select account type and click next. You'll be redirected to a registration form.
Fill Out the Registration Form:
- Provide an email address, and set a password.
Verify Your Mobile Number:
- Enter your mobile phone number. You'll receive a verification code via email SMS. Enter it to proceed.
Complete Account Information:
- Provide personal details, billing information, and verify your identity if required.
Select a Payment Method:
- Add a valid payment method (credit card or PayPal) for resource usage billing.
Accept Terms and Conditions:
- Agree to Alibaba Cloud's terms of service and privacy policy.
Submit the Form:
- Your account should now be set up and ready to use.

2. Launching an FPGA Instance

Steps:

Log In to the Alibaba Cloud Console:
- Go to Alibaba Cloud Console and sign in with your credentials.
Navigate to Elastic Compute Service (ECS):
- From the console dashboard, select Elastic Compute Service under Products & Services.
Create an Instance:
- Click on Instances in the left menu, then click Create Instance.
Select Region and Zone:
- Choose a region that supports FPGA instances (e.g., China East 1 or China North 2). Note that some regions may have restrictions.
Select Instance Type:
- Choose the f3 instance family, which includes FPGA capabilities.
- For example, select f3.4xlarge which includes:
  - 16 vCPUs
  - 64 GB RAM
  - Xilinx Virtex UltraScale+ VU9P FPGA
Select an Image (Operating System):
- Choose an FPGA Development AMI provided by Alibaba Cloud.
- If available, select an image like FPGA Developer AMI which includes necessary tools.
- Alternatively, choose a standard Linux distribution (e.g., Ubuntu 18.04) and manually install the tools.
Configure Storage:
- Allocate sufficient storage for development tools and project files (e.g., 100 GB).
Set Up Security Group (Firewall Rules):
- Configure inbound rules to allow SSH access (port 22) from your IP address.
Set Instance Details:
- Assign a key pair for SSH access.
- Configure instance name, tags, and other details as needed.
Review and Launch:
- Confirm the configuration and launch the instance.
Wait for the Instance to Start:
- It may take a few minutes for the instance to be ready.
Note the Public IP Address:
- You'll need this to SSH into the instance.

3. Preparing the Development Environment

Steps:

SSH into the FPGA Instance:
- Open a terminal on your local machine.
- Connect to the instance using the key pair you assigned:
```
ssh -i /path/to/your/private/key.pem ubuntu@your_instance_public_ip
```
Update the System Packages:
```
sudo apt update && sudo apt upgrade -y
```
Install Required Dependencies:
- Install essential build tools and libraries:
```
sudo apt install -y build-essential git wget libssl-dev
```
Install Xilinx Tools:
- Option 1: Using Pre-installed Tools
  - If you selected an FPGA Developer AMI, the Xilinx tools might already be installed.
  - Verify by checking for Vivado and XRT:
```
vivado -version
xbutil --version
```
- Option 2: Manual Installation
  - Download Vivado and XRT:
    - Go to Xilinx Download Center.
    - Download the Vivado HLx 2020.1 WebPACK Edition (free version).
    - Download the XRT (Xilinx Runtime) from Xilinx GitHub Releases.
  - Install Vivado:
    - Transfer the installer to the instance (use scp or wget if available).
    - Run the installer:
      sudo ./Xilinx_Vivado_SDK_Web_2020.1_0602_1208_Lin64.bin
    - Follow the on-screen instructions to install Vivado WebPACK Edition.
  - Install XRT:
    - Install required dependencies:
      sudo apt install -y libboost-all-dev
    - Download the appropriate .deb package for Ubuntu 18.04.
    - Install XRT:
      sudo dpkg -i xrt_2020.1.8.621_18.04-amd64-xrt.deb
  - Set Up Environment Variables:
```
echo "source /opt/Xilinx/Vivado/2020.1/settings64.sh" >> ~/.bashrc
source ~/.bashrc
```
Verify Installation:
- Check that Vivado and XRT are installed:
```
vivado -version
xbutil --version
```
Clone Xilinx Runtime (XRT) Source Code (Optional):
- If you need to build XRT from source for the latest updates or custom modifications:
```
git clone https://github.com/Xilinx/XRT.git
cd XRT
```
- Follow the build instructions in the repository.
Install OpenCL Headers (Optional):
- If your application uses OpenCL:
```
sudo apt install -y opencl-headers ocl-icd-opencl-dev
```

4. Understanding the Xilinx FPGA Framework

The Xilinx FPGA framework allows communication between the host CPU and FPGA over PCIe using the Xilinx Runtime (XRT). XRT is an open-source driver and runtime library that provides a standardized API for FPGA applications.

Key Components:
- XRT (Xilinx Runtime): Open-source driver and runtime library for communication between CPU and FPGA.
- FPGA Shell (Platform): The base FPGA image that provides PCIe connectivity and standard interfaces.
- FPGA User Logic (Kernel): Your custom logic implemented in Verilog.
- Host Application: Software application running on the CPU that communicates with the FPGA via XRT APIs.
Development Flow:
1. Create FPGA Design:
  - Develop the FPGA kernel in Verilog.
  - Use AXI interfaces for communication.
2. Create Host Application:
  - Use XRT APIs to communicate with the FPGA over PCIe.
3. Compile FPGA Design:
  - Use Vivado to compile the FPGA design and generate a bitstream (.xclbin).
4. Program the FPGA:
  - Use XRT tools to program the FPGA with the bitstream.
5. Run Host Application:
  - Execute the host application to send and receive data.

5. Creating the FPGA Design (Verilog)

We'll create a simple Verilog kernel that increments an input value and sends it back to the host.

Steps:

Set Up the Directory Structure:

mkdir -p ~/fpga_project/kernel
cd ~/fpga_project

Create the Verilog Kernel Code:

Navigate to the kernel directory:
```
cd kernel
```
Create a file named increment_kernel.v:
```
nano increment_kernel.v
```

Add the following Verilog code:

module increment_kernel (
    input wire ap_clk,
    input wire ap_rst_n,
    input wire [31:0] s_axi_control_AWADDR,
    input wire s_axi_control_AWVALID,
    output wire s_axi_control_AWREADY,
    input wire [31:0] s_axi_control_WDATA,
    input wire [3:0] s_axi_control_WSTRB,
    input wire s_axi_control_WVALID,
    output wire s_axi_control_WREADY,
    output wire [1:0] s_axi_control_BRESP,
    output wire s_axi_control_BVALID,
    input wire s_axi_control_BREADY,
    input wire [31:0] s_axi_control_ARADDR,
    input wire s_axi_control_ARVALID,
    output wire s_axi_control_ARREADY,
    output wire [31:0] s_axi_control_RDATA,
    output wire [1:0] s_axi_control_RRESP,
    output wire s_axi_control_RVALID,
    input wire s_axi_control_RREADY,
    output wire interrupt,

    // AXI4 Master Interface
    output wire [31:0] m_axi_gmem_AWADDR,
    output wire [7:0] m_axi_gmem_AWLEN,
    output wire [2:0] m_axi_gmem_AWSIZE,
    output wire [1:0] m_axi_gmem_AWBURST,
    output wire m_axi_gmem_AWLOCK,
    output wire [3:0] m_axi_gmem_AWCACHE,
    output wire [2:0] m_axi_gmem_AWPROT,
    output wire [3:0] m_axi_gmem_AWQOS,
    output wire m_axi_gmem_AWVALID,
    input wire m_axi_gmem_AWREADY,
    output wire [511:0] m_axi_gmem_WDATA,
    output wire [63:0] m_axi_gmem_WSTRB,
    output wire m_axi_gmem_WLAST,
    output wire m_axi_gmem_WVALID,
    input wire m_axi_gmem_WREADY,
    input wire [1:0] m_axi_gmem_BRESP,
    input wire m_axi_gmem_BVALID,
    output wire m_axi_gmem_BREADY,
    output wire [31:0] m_axi_gmem_ARADDR,
    output wire [7:0] m_axi_gmem_ARLEN,
    output wire [2:0] m_axi_gmem_ARSIZE,
    output wire [1:0] m_axi_gmem_ARBURST,
    output wire m_axi_gmem_ARLOCK,
    output wire [3:0] m_axi_gmem_ARCACHE,
    output wire [2:0] m_axi_gmem_ARPROT,
    output wire [3:0] m_axi_gmem_ARQOS,
    output wire m_axi_gmem_ARVALID,
    input wire m_axi_gmem_ARREADY,
    input wire [511:0] m_axi_gmem_RDATA,
    input wire [1:0] m_axi_gmem_RRESP,
    input wire m_axi_gmem_RLAST,
    input wire m_axi_gmem_RVALID,
    output wire m_axi_gmem_RREADY
);

// Your logic goes here

// For simplicity, this example does not implement full AXI transactions.
// Implement the necessary AXI4 protocol to read and write data from host memory.

// This is a placeholder to show where the increment operation would be implemented.

endmodule

Note: Implementing full AXI4 master interfaces requires significant code to handle all protocol signals. For simplicity, we'll use the High-Level Synthesis (HLS) tool to generate the AXI interfaces.

Use HLS to Simplify Kernel Development (Optional):
- If you're comfortable with HLS, you can write the kernel in C/C++ and let HLS generate the Verilog with AXI interfaces.
- Here's an example of an HLS kernel:
```
#include <ap_int.h>
extern "C" {
void increment_kernel(ap_uint<512>* in, ap_uint<512>* out, int size) {
    for (int i = 0; i < size; i++) {
        out[i] = in[i] + 1;
    }
}
}
```
- Save this code as increment_kernel.cpp in the kernel directory.

Create a Kernel Description File (Kernel Definition):

Create a file named kernel.xml:

<?xml version="1.0" encoding="UTF-8"?>
<root version="1.0" xilinx_version="2020.1">
  <kernel name="increment_kernel" language="ip_c" vlnv="xilinx.com:hls:increment_kernel:1.0" attributes="" preferredWorkGroupSizeMultiple="0" workGroupSize="0,0,0" runtime="OpenCL">
    <ports>
      <port name="in" mode="read_only" range="0xFFFFFFFFFFFFFFFF" port="m_axi_gmem" arg_index="0" host_offset="0" size="0x0"/>
      <port name="out" mode="write_only" range="0xFFFFFFFFFFFFFFFF" port="m_axi_gmem" arg_index="1" host_offset="0" size="0x0"/>
      <port name="size" mode="read_only" range="0xFFFFFFFFFFFFFFFF" port="" arg_index="2" host_offset="0" size="0x0"/>
    </ports>
    <args>
      <arg name="in" addressQualifier="1" id="0" port="in" size="8" offset="0x10"/>
      <arg name="out" addressQualifier="2" id="1" port="out" size="8" offset="0x18"/>
      <arg name="size" addressQualifier="0" id="2" port="" size="4" offset="0x20"/>
    </args>
  </kernel>
</root>

Create a Makefile for the Kernel:

Create a file named Makefile in the fpga_project directory:

TARGET=hw
DEVICE=xilinx_u200_xdma_201830_2

all: build

build:
    v++ -c -t $(TARGET) --platform $(DEVICE) -k increment_kernel -o kernel.xo kernel/increment_kernel.cpp
    v++ -l -t $(TARGET) --platform $(DEVICE) -o binary_container.xclbin kernel.xo

clean:
    rm -f kernel.xo binary_container.xclbin

Note: Replace DEVICE with the actual platform name of your FPGA instance (e.g., as reported by xbutil scan).

Explanation:
- We use Vitis (v++) to compile the kernel.
- The kernel code is written in HLS C++, which is easier for creating AXI interfaces.
- The v++ compiler will generate the necessary RTL and interfaces.

6. Compiling the FPGA Design

Steps:

Set Up the Environment Variables:
- Ensure that Vitis and XRT are properly set up:
```
source /tools/Xilinx/Vitis/2020.1/settings64.sh
source /opt/xilinx/xrt/setup.sh
```
- Note: Adjust the paths according to your installation directories.
Build the Kernel:
- From the fpga_project directory, run:
```
make
```
- This will:
  - Compile the kernel code to an object file (.xo).
  - Link the object file to create an FPGA binary (.xclbin).
- Note: The build process may take some time.
Verify the Generated Files:
- After the build completes, you should have:
  - kernel.xo - Kernel object file.
  - binary_container.xclbin - FPGA binary file.

7. Programming the FPGA

Steps:

Check Available FPGA Devices:
- Use xbutil to list FPGA devices:
```
sudo xbutil scan
```
- You should see information about the FPGA device, including its platform name.
Program the FPGA:
- Use xbutil to program the FPGA with the generated .xclbin file:
```
sudo xbutil program -d 0 -p binary_container.xclbin
```
- Explanation:
  - -d 0 specifies the device ID (use 0 if you have only one FPGA).
  - -p specifies the path to the FPGA binary.
Verify Programming:
- Check the FPGA status:
```
sudo xbutil validate -d 0
```
- Ensure that the FPGA is programmed correctly and passes validation tests.

8. Creating the Host Application (C++)

We'll create a host application that uses XRT APIs to communicate with the FPGA.

Steps:

Create the Host Application Directory:

mkdir ~/fpga_project/host
cd ~/fpga_project/host

Write the Host Application Code:

Create a file named host.cpp:
```
nano host.cpp
```

Add the following code:

#include <xrt/xrt.h>
#include <xrt/xrt_kernel.h>
#include <xrt/xrt_bo.h>
#include <iostream>
#include <fstream>
#include <vector>

int main(int argc, char** argv) {
    // Load the FPGA binary
    std::string binaryFile = "../binary_container.xclbin";
    if (argc == 2) {
        binaryFile = argv[1];
    }

    // Open the device
    auto device = xrt::device(0);

    // Load the xclbin
    std::ifstream bin_file(binaryFile, std::ifstream::binary);
    bin_file.seekg(0, bin_file.end);
    size_t size = bin_file.tellg();
    bin_file.seekg(0, bin_file.beg);
    std::vector<char> buffer(size);
    bin_file.read(buffer.data(), size);
    auto uuid = device.load_xclbin(buffer);

    // Open the kernel
    auto krnl = xrt::kernel(device, uuid, "increment_kernel");

    // Allocate buffer on FPGA
    size_t vector_size = 1024;
    size_t vector_size_in_bytes = vector_size * sizeof(uint64_t);

    auto in_bo = xrt::bo(device, vector_size_in_bytes, krnl.group_id(0));
    auto out_bo = xrt::bo(device, vector_size_in_bytes, krnl.group_id(1));

    // Map the buffers
    auto in_ptr = in_bo.map<uint64_t*>();
    auto out_ptr = out_bo.map<uint64_t*>();

    // Initialize input data
    for (size_t i = 0; i < vector_size; i++) {
        in_ptr[i] = i;
    }

    // Synchronize input buffer data to device global memory
    in_bo.sync(XCL_BO_SYNC_BO_TO_DEVICE);

    // Run the kernel
    auto run = krnl(in_bo, out_bo, vector_size);
    run.wait();

    // Synchronize output buffer data from device global memory
    out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);

    // Verify the results
    int match = 0;
    for (size_t i = 0; i < vector_size; i++) {
        uint64_t expected = in_ptr[i] + 1;
        if (out_ptr[i] != expected) {
            std::cout << "Error at index " << i << ": expected " << expected << ", got " << out_ptr[i] << std::endl;
            match = 1;
            break;
        }
    }

    if (match == 0) {
        std::cout << "SUCCESS: FPGA incremented the data correctly." << std::endl;
    } else {
        std::cout << "ERROR: FPGA did not increment the data correctly." << std::endl;
    }

    return match;
}

Explanation:
- Loads the FPGA binary.
- Allocates buffers for input and output data.
- Initializes input data.
- Runs the kernel on the FPGA.
- Reads back and verifies the output data.

Create a Makefile for the Host Application:

Create a file named Makefile in the host directory:

CXX= g++
CXXFLAGS= -Wall -O0 -g -std=c++11
LDFLAGS= -L/opt/xilinx/xrt/lib -lxrt_coreutil -pthread

HOST_EXE=host_app

all: $(HOST_EXE)

$(HOST_EXE): host.cpp
    $(CXX) $(CXXFLAGS) -I/opt/xilinx/xrt/include $^ -o $@ $(LDFLAGS)

clean:
    rm -f $(HOST_EXE)

9. Compiling the Host Application

Steps:

Set Up the Build Environment:
- Ensure that XRT is properly set up:
```
source /opt/xilinx/xrt/setup.sh
```
Compile the Host Application:
- From the host directory, run:
```
make
```
- This will compile host.cpp into host_app.

10. Running the Host Application

Steps:

Execute the Host Application:
- From the host directory, run:
```
./host_app
```
- Or specify the path to the FPGA binary if necessary:
```
./host_app ../binary_container.xclbin
```
Expected Output:
```
SUCCESS: FPGA incremented the data correctly.
```
- The application sends a vector of data to the FPGA.
- The FPGA increments each element by 1.
- The host application reads back the data and verifies the result.

11. Verifying the Results

Steps:

Check the Host Application Output:
- Ensure that the output indicates success.
Debugging If Needed:
- If the output shows an error:
  - Check that the FPGA is programmed with the correct bitstream.
  - Ensure that the device ID used in programming and in the host code matches.
  - Verify that the kernel name in the host code matches the kernel name used in the FPGA design.
  - Check for any error messages during compilation or execution.
Monitor FPGA Status:
- Use xbutil to check FPGA status:
```
sudo xbutil query -d 0
```
- Look for any reported errors.

12. Cleaning Up

Steps:

Release FPGA Resources:
- Ensure that the host application has terminated and released all resources.
Clean Up Build Files:
- From the fpga_project directory, run:
```
make clean
cd host
make clean
```
Terminate the FPGA Instance (If No Longer Needed):
- From the Alibaba Cloud console, stop or terminate the instance to avoid incurring charges.

13. Conclusion

By following these detailed steps, you've:

Set up an account and launched an FPGA instance on Alibaba Cloud.
Prepared the development environment for FPGA programming with Xilinx tools and open-source XRT drivers.
Created a Verilog-based FPGA kernel (using HLS for simplicity) that processes data sent from the CPU.
Compiled the FPGA design and programmed the FPGA with it.
Written a C++ host application that uses open-source XRT drivers to communicate with the FPGA.
Executed the host application, achieving direct, secure communication between the CPU and FPGA over PCIe.
Verified the correct operation of the system.

This setup provides a real-world example of CPU-FPGA communication in a cloud environment using Verilog and open-source drivers, satisfying your requirements.

14. References

Alibaba Cloud Documentation:
- Alibaba Cloud ECS Documentation
- Alibaba Cloud FPGA Instances
Xilinx Documentation:
FPGA Programming Guides:
- Vitis Unified Software Platform Documentation
- XRT Native API Guide
Xilinx High-Level Synthesis (HLS):
- Vitis HLS User Guide

Important Notes:

Permissions and Privileges:
- Programming the FPGA may require sudo privileges.
- Ensure you have the necessary permissions on the FPGA instance.
Resource Usage:
- FPGA compilation can be resource-intensive and time-consuming.
- Be mindful of instance usage and billing on Alibaba Cloud.
Security Considerations:
- The communication between CPU and FPGA over PCIe is considered secure within the cloud environment.
- XRT provides mechanisms for secure communication, but always ensure your code handles data securely.
Driver Interface:
- XRT is fully open-source, satisfying your requirement for open-source drivers.
- The XRT SDK is licensed under the Apache License 2.0.
Alternative Options:
- If Alibaba Cloud does not meet your needs due to restrictions, consider other cloud providers like AWS EC2 F1 instances or local FPGA development boards.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Table of Contents

1. Setting Up Your Alibaba Cloud Account

Steps:

2. Launching an FPGA Instance

Steps:

3. Preparing the Development Environment

Steps:

4. Understanding the Xilinx FPGA Framework

5. Creating the FPGA Design (Verilog)

Steps:

6. Compiling the FPGA Design

Steps:

7. Programming the FPGA

Steps:

8. Creating the Host Application (C++)

Steps:

9. Compiling the Host Application

Steps:

10. Running the Host Application

Steps:

11. Verifying the Results

Steps:

12. Cleaning Up

Steps:

13. Conclusion

14. References

Important Notes:

About

Uh oh!

Releases

Packages

License

anthony-firn/CPU-FPGA-Tutorial

Folders and files

Latest commit

History

Repository files navigation

Overview

Table of Contents

1. Setting Up Your Alibaba Cloud Account

Steps:

2. Launching an FPGA Instance

Steps:

3. Preparing the Development Environment

Steps:

4. Understanding the Xilinx FPGA Framework

5. Creating the FPGA Design (Verilog)

Steps:

6. Compiling the FPGA Design

Steps:

7. Programming the FPGA

Steps:

8. Creating the Host Application (C++)

Steps:

9. Compiling the Host Application

Steps:

10. Running the Host Application

Steps:

11. Verifying the Results

Steps:

12. Cleaning Up

Steps:

13. Conclusion

14. References

Important Notes:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages