- Platform: Alibaba Cloud
- FPGA Instance: f3 Instance Family with Xilinx Virtex UltraScale+ VU9P FPGA
- Communication Interface: PCIe using Open-Source Xilinx Runtime (XRT) Drivers
- FPGA Code: Verilog
- CPU Code: C++
- Driver Interface: Fully Open-Source (Xilinx Runtime - XRT)
- Goal: Implement a CPU-FPGA application where data is sent from the CPU to the FPGA, processed on the FPGA, and sent back to the CPU.
- Setting Up Your Alibaba Cloud Account
- Launching an FPGA Instance
- Preparing the Development Environment
- Understanding the Xilinx FPGA Framework
- Creating the FPGA Design (Verilog)
- Compiling the FPGA Design
- Programming the FPGA
- Creating the Host Application (C++)
- Compiling the Host Application
- Running the Host Application
- Verifying the Results
- Cleaning Up
- Conclusion
- References
-
Visit the Alibaba Cloud Registration Page:
- Go to Alibaba Cloud Registration.
-
Click on "Business Account" or "Individual Account":
- Select account type and click next. You'll be redirected to a registration form.
-
Fill Out the Registration Form:
- Provide an email address, and set a password.
-
Verify Your Mobile Number:
- Enter your mobile phone number. You'll receive a verification code via email SMS. Enter it to proceed.
-
Complete Account Information:
- Provide personal details, billing information, and verify your identity if required.
-
Select a Payment Method:
- Add a valid payment method (credit card or PayPal) for resource usage billing.
-
Accept Terms and Conditions:
- Agree to Alibaba Cloud's terms of service and privacy policy.
-
Submit the Form:
- Your account should now be set up and ready to use.
-
Log In to the Alibaba Cloud Console:
- Go to Alibaba Cloud Console and sign in with your credentials.
-
Navigate to Elastic Compute Service (ECS):
- From the console dashboard, select Elastic Compute Service under Products & Services.
-
Create an Instance:
- Click on Instances in the left menu, then click Create Instance.
-
Select Region and Zone:
- Choose a region that supports FPGA instances (e.g., China East 1 or China North 2). Note that some regions may have restrictions.
-
Select Instance Type:
-
Choose the f3 instance family, which includes FPGA capabilities.
-
For example, select f3.4xlarge which includes:
- 16 vCPUs
- 64 GB RAM
- Xilinx Virtex UltraScale+ VU9P FPGA
-
-
Select an Image (Operating System):
-
Choose an FPGA Development AMI provided by Alibaba Cloud.
-
If available, select an image like FPGA Developer AMI which includes necessary tools.
-
Alternatively, choose a standard Linux distribution (e.g., Ubuntu 18.04) and manually install the tools.
-
-
Configure Storage:
- Allocate sufficient storage for development tools and project files (e.g., 100 GB).
-
Set Up Security Group (Firewall Rules):
- Configure inbound rules to allow SSH access (port 22) from your IP address.
-
Set Instance Details:
-
Assign a key pair for SSH access.
-
Configure instance name, tags, and other details as needed.
-
-
Review and Launch:
- Confirm the configuration and launch the instance.
-
Wait for the Instance to Start:
- It may take a few minutes for the instance to be ready.
-
Note the Public IP Address:
- You'll need this to SSH into the instance.
-
SSH into the FPGA Instance:
-
Open a terminal on your local machine.
-
Connect to the instance using the key pair you assigned:
ssh -i /path/to/your/private/key.pem ubuntu@your_instance_public_ip
-
-
Update the System Packages:
sudo apt update && sudo apt upgrade -y
-
Install Required Dependencies:
-
Install essential build tools and libraries:
sudo apt install -y build-essential git wget libssl-dev
-
-
Install Xilinx Tools:
-
Option 1: Using Pre-installed Tools
-
If you selected an FPGA Developer AMI, the Xilinx tools might already be installed.
-
Verify by checking for Vivado and XRT:
vivado -version xbutil --version
-
-
Option 2: Manual Installation
-
Download Vivado and XRT:
-
Go to Xilinx Download Center.
-
Download the Vivado HLx 2020.1 WebPACK Edition (free version).
-
Download the XRT (Xilinx Runtime) from Xilinx GitHub Releases.
-
-
Install Vivado:
-
Transfer the installer to the instance (use
scp
orwget
if available). -
Run the installer:
sudo ./Xilinx_Vivado_SDK_Web_2020.1_0602_1208_Lin64.bin
-
Follow the on-screen instructions to install Vivado WebPACK Edition.
-
-
Install XRT:
-
Install required dependencies:
sudo apt install -y libboost-all-dev
-
Download the appropriate
.deb
package for Ubuntu 18.04. -
Install XRT:
sudo dpkg -i xrt_2020.1.8.621_18.04-amd64-xrt.deb
-
-
Set Up Environment Variables:
echo "source /opt/Xilinx/Vivado/2020.1/settings64.sh" >> ~/.bashrc source ~/.bashrc
-
-
-
Verify Installation:
-
Check that Vivado and XRT are installed:
vivado -version xbutil --version
-
-
Clone Xilinx Runtime (XRT) Source Code (Optional):
-
If you need to build XRT from source for the latest updates or custom modifications:
git clone https://github.com/Xilinx/XRT.git cd XRT
-
Follow the build instructions in the repository.
-
-
Install OpenCL Headers (Optional):
-
If your application uses OpenCL:
sudo apt install -y opencl-headers ocl-icd-opencl-dev
-
The Xilinx FPGA framework allows communication between the host CPU and FPGA over PCIe using the Xilinx Runtime (XRT). XRT is an open-source driver and runtime library that provides a standardized API for FPGA applications.
-
Key Components:
- XRT (Xilinx Runtime): Open-source driver and runtime library for communication between CPU and FPGA.
- FPGA Shell (Platform): The base FPGA image that provides PCIe connectivity and standard interfaces.
- FPGA User Logic (Kernel): Your custom logic implemented in Verilog.
- Host Application: Software application running on the CPU that communicates with the FPGA via XRT APIs.
-
Development Flow:
- Create FPGA Design:
- Develop the FPGA kernel in Verilog.
- Use AXI interfaces for communication.
- Create Host Application:
- Use XRT APIs to communicate with the FPGA over PCIe.
- Compile FPGA Design:
- Use Vivado to compile the FPGA design and generate a bitstream (.xclbin).
- Program the FPGA:
- Use XRT tools to program the FPGA with the bitstream.
- Run Host Application:
- Execute the host application to send and receive data.
- Create FPGA Design:
We'll create a simple Verilog kernel that increments an input value and sends it back to the host.
-
Set Up the Directory Structure:
mkdir -p ~/fpga_project/kernel cd ~/fpga_project
-
Create the Verilog Kernel Code:
-
Navigate to the kernel directory:
cd kernel
-
Create a file named
increment_kernel.v
:nano increment_kernel.v
-
Add the following Verilog code:
module increment_kernel ( input wire ap_clk, input wire ap_rst_n, input wire [31:0] s_axi_control_AWADDR, input wire s_axi_control_AWVALID, output wire s_axi_control_AWREADY, input wire [31:0] s_axi_control_WDATA, input wire [3:0] s_axi_control_WSTRB, input wire s_axi_control_WVALID, output wire s_axi_control_WREADY, output wire [1:0] s_axi_control_BRESP, output wire s_axi_control_BVALID, input wire s_axi_control_BREADY, input wire [31:0] s_axi_control_ARADDR, input wire s_axi_control_ARVALID, output wire s_axi_control_ARREADY, output wire [31:0] s_axi_control_RDATA, output wire [1:0] s_axi_control_RRESP, output wire s_axi_control_RVALID, input wire s_axi_control_RREADY, output wire interrupt, // AXI4 Master Interface output wire [31:0] m_axi_gmem_AWADDR, output wire [7:0] m_axi_gmem_AWLEN, output wire [2:0] m_axi_gmem_AWSIZE, output wire [1:0] m_axi_gmem_AWBURST, output wire m_axi_gmem_AWLOCK, output wire [3:0] m_axi_gmem_AWCACHE, output wire [2:0] m_axi_gmem_AWPROT, output wire [3:0] m_axi_gmem_AWQOS, output wire m_axi_gmem_AWVALID, input wire m_axi_gmem_AWREADY, output wire [511:0] m_axi_gmem_WDATA, output wire [63:0] m_axi_gmem_WSTRB, output wire m_axi_gmem_WLAST, output wire m_axi_gmem_WVALID, input wire m_axi_gmem_WREADY, input wire [1:0] m_axi_gmem_BRESP, input wire m_axi_gmem_BVALID, output wire m_axi_gmem_BREADY, output wire [31:0] m_axi_gmem_ARADDR, output wire [7:0] m_axi_gmem_ARLEN, output wire [2:0] m_axi_gmem_ARSIZE, output wire [1:0] m_axi_gmem_ARBURST, output wire m_axi_gmem_ARLOCK, output wire [3:0] m_axi_gmem_ARCACHE, output wire [2:0] m_axi_gmem_ARPROT, output wire [3:0] m_axi_gmem_ARQOS, output wire m_axi_gmem_ARVALID, input wire m_axi_gmem_ARREADY, input wire [511:0] m_axi_gmem_RDATA, input wire [1:0] m_axi_gmem_RRESP, input wire m_axi_gmem_RLAST, input wire m_axi_gmem_RVALID, output wire m_axi_gmem_RREADY ); // Your logic goes here // For simplicity, this example does not implement full AXI transactions. // Implement the necessary AXI4 protocol to read and write data from host memory. // This is a placeholder to show where the increment operation would be implemented. endmodule
-
Note: Implementing full AXI4 master interfaces requires significant code to handle all protocol signals. For simplicity, we'll use the High-Level Synthesis (HLS) tool to generate the AXI interfaces.
-
-
Use HLS to Simplify Kernel Development (Optional):
-
If you're comfortable with HLS, you can write the kernel in C/C++ and let HLS generate the Verilog with AXI interfaces.
-
Here's an example of an HLS kernel:
#include <ap_int.h> extern "C" { void increment_kernel(ap_uint<512>* in, ap_uint<512>* out, int size) { for (int i = 0; i < size; i++) { out[i] = in[i] + 1; } } }
-
Save this code as
increment_kernel.cpp
in thekernel
directory.
-
-
Create a Kernel Description File (Kernel Definition):
-
Create a file named
kernel.xml
:<?xml version="1.0" encoding="UTF-8"?> <root version="1.0" xilinx_version="2020.1"> <kernel name="increment_kernel" language="ip_c" vlnv="xilinx.com:hls:increment_kernel:1.0" attributes="" preferredWorkGroupSizeMultiple="0" workGroupSize="0,0,0" runtime="OpenCL"> <ports> <port name="in" mode="read_only" range="0xFFFFFFFFFFFFFFFF" port="m_axi_gmem" arg_index="0" host_offset="0" size="0x0"/> <port name="out" mode="write_only" range="0xFFFFFFFFFFFFFFFF" port="m_axi_gmem" arg_index="1" host_offset="0" size="0x0"/> <port name="size" mode="read_only" range="0xFFFFFFFFFFFFFFFF" port="" arg_index="2" host_offset="0" size="0x0"/> </ports> <args> <arg name="in" addressQualifier="1" id="0" port="in" size="8" offset="0x10"/> <arg name="out" addressQualifier="2" id="1" port="out" size="8" offset="0x18"/> <arg name="size" addressQualifier="0" id="2" port="" size="4" offset="0x20"/> </args> </kernel> </root>
-
-
Create a Makefile for the Kernel:
-
Create a file named
Makefile
in thefpga_project
directory:TARGET=hw DEVICE=xilinx_u200_xdma_201830_2 all: build build: v++ -c -t $(TARGET) --platform $(DEVICE) -k increment_kernel -o kernel.xo kernel/increment_kernel.cpp v++ -l -t $(TARGET) --platform $(DEVICE) -o binary_container.xclbin kernel.xo clean: rm -f kernel.xo binary_container.xclbin
-
Note: Replace
DEVICE
with the actual platform name of your FPGA instance (e.g., as reported byxbutil scan
).
-
-
Explanation:
-
We use Vitis (
v++
) to compile the kernel. -
The kernel code is written in HLS C++, which is easier for creating AXI interfaces.
-
The
v++
compiler will generate the necessary RTL and interfaces.
-
-
Set Up the Environment Variables:
-
Ensure that Vitis and XRT are properly set up:
source /tools/Xilinx/Vitis/2020.1/settings64.sh source /opt/xilinx/xrt/setup.sh
-
Note: Adjust the paths according to your installation directories.
-
-
Build the Kernel:
-
From the
fpga_project
directory, run:make
-
This will:
-
Compile the kernel code to an object file (
.xo
). -
Link the object file to create an FPGA binary (
.xclbin
).
-
-
Note: The build process may take some time.
-
-
Verify the Generated Files:
-
After the build completes, you should have:
-
kernel.xo
- Kernel object file. -
binary_container.xclbin
- FPGA binary file.
-
-
-
Check Available FPGA Devices:
-
Use
xbutil
to list FPGA devices:sudo xbutil scan
-
You should see information about the FPGA device, including its platform name.
-
-
Program the FPGA:
-
Use
xbutil
to program the FPGA with the generated.xclbin
file:sudo xbutil program -d 0 -p binary_container.xclbin
-
Explanation:
-
-d 0
specifies the device ID (use0
if you have only one FPGA). -
-p
specifies the path to the FPGA binary.
-
-
-
Verify Programming:
-
Check the FPGA status:
sudo xbutil validate -d 0
-
Ensure that the FPGA is programmed correctly and passes validation tests.
-
We'll create a host application that uses XRT APIs to communicate with the FPGA.
-
Create the Host Application Directory:
mkdir ~/fpga_project/host cd ~/fpga_project/host
-
Write the Host Application Code:
-
Create a file named
host.cpp
:nano host.cpp
-
Add the following code:
#include <xrt/xrt.h> #include <xrt/xrt_kernel.h> #include <xrt/xrt_bo.h> #include <iostream> #include <fstream> #include <vector> int main(int argc, char** argv) { // Load the FPGA binary std::string binaryFile = "../binary_container.xclbin"; if (argc == 2) { binaryFile = argv[1]; } // Open the device auto device = xrt::device(0); // Load the xclbin std::ifstream bin_file(binaryFile, std::ifstream::binary); bin_file.seekg(0, bin_file.end); size_t size = bin_file.tellg(); bin_file.seekg(0, bin_file.beg); std::vector<char> buffer(size); bin_file.read(buffer.data(), size); auto uuid = device.load_xclbin(buffer); // Open the kernel auto krnl = xrt::kernel(device, uuid, "increment_kernel"); // Allocate buffer on FPGA size_t vector_size = 1024; size_t vector_size_in_bytes = vector_size * sizeof(uint64_t); auto in_bo = xrt::bo(device, vector_size_in_bytes, krnl.group_id(0)); auto out_bo = xrt::bo(device, vector_size_in_bytes, krnl.group_id(1)); // Map the buffers auto in_ptr = in_bo.map<uint64_t*>(); auto out_ptr = out_bo.map<uint64_t*>(); // Initialize input data for (size_t i = 0; i < vector_size; i++) { in_ptr[i] = i; } // Synchronize input buffer data to device global memory in_bo.sync(XCL_BO_SYNC_BO_TO_DEVICE); // Run the kernel auto run = krnl(in_bo, out_bo, vector_size); run.wait(); // Synchronize output buffer data from device global memory out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE); // Verify the results int match = 0; for (size_t i = 0; i < vector_size; i++) { uint64_t expected = in_ptr[i] + 1; if (out_ptr[i] != expected) { std::cout << "Error at index " << i << ": expected " << expected << ", got " << out_ptr[i] << std::endl; match = 1; break; } } if (match == 0) { std::cout << "SUCCESS: FPGA incremented the data correctly." << std::endl; } else { std::cout << "ERROR: FPGA did not increment the data correctly." << std::endl; } return match; }
-
Explanation:
-
Loads the FPGA binary.
-
Allocates buffers for input and output data.
-
Initializes input data.
-
Runs the kernel on the FPGA.
-
Reads back and verifies the output data.
-
-
-
Create a Makefile for the Host Application:
-
Create a file named
Makefile
in thehost
directory:CXX= g++ CXXFLAGS= -Wall -O0 -g -std=c++11 LDFLAGS= -L/opt/xilinx/xrt/lib -lxrt_coreutil -pthread HOST_EXE=host_app all: $(HOST_EXE) $(HOST_EXE): host.cpp $(CXX) $(CXXFLAGS) -I/opt/xilinx/xrt/include $^ -o $@ $(LDFLAGS) clean: rm -f $(HOST_EXE)
-
-
Set Up the Build Environment:
-
Ensure that XRT is properly set up:
source /opt/xilinx/xrt/setup.sh
-
-
Compile the Host Application:
-
From the
host
directory, run:make
-
This will compile
host.cpp
intohost_app
.
-
-
Execute the Host Application:
-
From the
host
directory, run:./host_app
-
Or specify the path to the FPGA binary if necessary:
./host_app ../binary_container.xclbin
-
-
Expected Output:
SUCCESS: FPGA incremented the data correctly.
-
The application sends a vector of data to the FPGA.
-
The FPGA increments each element by 1.
-
The host application reads back the data and verifies the result.
-
-
Check the Host Application Output:
- Ensure that the output indicates success.
-
Debugging If Needed:
-
If the output shows an error:
-
Check that the FPGA is programmed with the correct bitstream.
-
Ensure that the device ID used in programming and in the host code matches.
-
Verify that the kernel name in the host code matches the kernel name used in the FPGA design.
-
Check for any error messages during compilation or execution.
-
-
-
Monitor FPGA Status:
-
Use
xbutil
to check FPGA status:sudo xbutil query -d 0
-
Look for any reported errors.
-
-
Release FPGA Resources:
- Ensure that the host application has terminated and released all resources.
-
Clean Up Build Files:
-
From the
fpga_project
directory, run:make clean cd host make clean
-
-
Terminate the FPGA Instance (If No Longer Needed):
- From the Alibaba Cloud console, stop or terminate the instance to avoid incurring charges.
By following these detailed steps, you've:
- Set up an account and launched an FPGA instance on Alibaba Cloud.
- Prepared the development environment for FPGA programming with Xilinx tools and open-source XRT drivers.
- Created a Verilog-based FPGA kernel (using HLS for simplicity) that processes data sent from the CPU.
- Compiled the FPGA design and programmed the FPGA with it.
- Written a C++ host application that uses open-source XRT drivers to communicate with the FPGA.
- Executed the host application, achieving direct, secure communication between the CPU and FPGA over PCIe.
- Verified the correct operation of the system.
This setup provides a real-world example of CPU-FPGA communication in a cloud environment using Verilog and open-source drivers, satisfying your requirements.
-
Alibaba Cloud Documentation:
-
Xilinx Documentation:
-
FPGA Programming Guides:
-
Xilinx High-Level Synthesis (HLS):
-
Permissions and Privileges:
- Programming the FPGA may require
sudo
privileges. - Ensure you have the necessary permissions on the FPGA instance.
- Programming the FPGA may require
-
Resource Usage:
- FPGA compilation can be resource-intensive and time-consuming.
- Be mindful of instance usage and billing on Alibaba Cloud.
-
Security Considerations:
- The communication between CPU and FPGA over PCIe is considered secure within the cloud environment.
- XRT provides mechanisms for secure communication, but always ensure your code handles data securely.
-
Driver Interface:
- XRT is fully open-source, satisfying your requirement for open-source drivers.
- The XRT SDK is licensed under the Apache License 2.0.
-
Alternative Options:
- If Alibaba Cloud does not meet your needs due to restrictions, consider other cloud providers like AWS EC2 F1 instances or local FPGA development boards.