AWS F1 DMA Example

This example uses the PipelineC Amazon Machine Image (AMI). It is based on Amazon's FPGA Developer AMI. This is the fastest way to run both the PipelineC tool and the AWS F1 build process. However, the PipelineC tool can also be run locally and its outputs copied to the AWS instance.

Start an F1 instance of the PipelineC AMI. Instructions for starting an F1 instance are provided by Amazon.

This FPGA design is based off of AWS's own DMA example. Please refer to their documentation for non PipelineC questions.

Run the example

In your AWS PipelineC AMI instance (doesn't need to be an F1 instance yet) use these steps to run the example:

Update to the latest PipelineC code

cd ~/src/project_data/PipelineC/;
git pull;

Run the AWS environment setup scripts

cd $AWS_FPGA_REPO_DIR
source hdk_setup.sh
source sdk_setup.sh
cd $HDK_DIR/cl/examples/cl_dram_dma
export CL_DIR=$(pwd)

Run the PipelineC tool (~ minutes to several hours)

cd ~/src/project_data/PipelineC/;
rm -r /home/centos/pipelinec_syn_output; 
python -u ./src/main.py 2>&1 | tee out.log`

Build Vivado checkpoint that will be turned into an Amazon FPGA Image (AFI) (~ several hours)

cd $CL_DIR/build/scripts
./aws_build_dcp_from_cl.sh

Wait for vivado to finish and put checkpoint file in $CL_DIR/build/checkpoints/to_aws/

ls -lt $CL_DIR/build/checkpoints/to_aws/ | grep .tar | head -n 1
# Set these environment variables based on your output
export TARTIMESTAMP=20_01_13-193024
export TARFILENAME=$TARTIMESTAMP.Developer_CL.tar

Copy checkpoint to Amazon S3 for Amazon to do their magic.

# Set environment vars needed 
export REGION=us-east-1
export S3BUCKET=pipelinec
export S3DCPDIRNAME=dcps
export S3LOGSDIRNAME=logs
aws s3 mb s3://$S3BUCKET --region $REGION  # Create an S3 bucket (choose a unique bucket name)
aws s3 mb s3://$S3BUCKET/$S3DCPDIRNAME/   # Create folder for your tarball files
aws s3 cp $CL_DIR/build/checkpoints/to_aws/$TARFILENAME s3://$S3BUCKET/$S3DCPDIRNAME/    # Upload the file to S3
# Make room for Amazon's log file on S3
aws s3 mb s3://$S3BUCKET/$S3LOGSDIRNAME/  # Create a folder to keep your logs
touch LOGS_FILES_GO_HERE.txt                     # Create a temp file
aws s3 cp LOGS_FILES_GO_HERE.txt s3://$S3BUCKET/$S3LOGSDIRNAME/   #Which creates the folder on S3

Tell Amazon to generate an AFI using those S3 files

export AFI_NAME=pipelinec
export AFI_DESC=fpmult16
aws ec2 create-fpga-image --region $REGION --name $AFI_NAME --description $AFI_DESC --input-storage-location Bucket=$S3BUCKET,Key=$S3DCPDIRNAME/$TARFILENAME --logs-storage-location Bucket=$S3BUCKET,Key=$S3LOGSDIRNAME
# Set these environment variables based on your output
export AFIID=afi-075ea4945e985aa7c
export AGFIID=agfi-0eee2e70791e149b7

Wait for Amazon to say your AFI is 'available' (~ few hours)

aws ec2 describe-fpga-images --fpga-image-ids $AFIID | grep "Code"

Start working with real FPGA hardware (must be on F1 instance now)

# Clear FPGA (30s)
sudo fpga-clear-local-image  -S 0
# Load FPGA (30s)
sudo fpga-load-local-image -S 0 -I $AGFIID
# Reset (pcie reset)
sudo fpga-describe-local-image -S 0 -R -H

Do test (rebuild, reset fpga again, run ./test)

reset;
cd /home/centos/src/project_data/PipelineC/examples/aws-fpga-dma
reset; make clean; make && sudo fpga-describe-local-image -S 0 -R -H && sudo ./test

How does the example work?

The original AWS DMA example allows you to write and read sections of memory on the FPGA. This example works by using a narrow portion of that functionality:

Write a small input buffer 'message' of fixed size N bytes to address 0
- This acts as the input to the FPGA hardware
Read those same N bytes 'message' from address 0
- This is the output from the FPGA hardware
No other addresses or buffer sizes are supported
A single write must be followed by a single read The code describing the conversion between AWS DMA interfaces and this simple 'message' abstraction is in files dma_msg.h, dma_msg_hw.c , and dma_msg_sw.c.

Input and output byte representation

DMA data is just bytes that need to be interpreted further, specific to your application. The files work_sw.c and work_hw.c describe the conversion of the DMA message struct to/from work input/output types.

The FPGA 'output = work(input)' function

This example sums N floating point values using a binary tree of adders. The work.h file contains the definition of output = work(input): the function, its inputs (N floating point values), and outputs (a single floating point value).

Software driver/tester

test.c describes the standard test of do work on the CPU, do work on the FPGA, and see if there was a speed up. It includes helper functions to easily swap out what the input values are and how the output values are compared. The CPU and FPGA both use the same work() function source code so this isn't the best possible CPU implementation to compare against for this particular example.

Interface to Amazon's code

In this example Amazon's hardware and software interfaces are wrapped behind a common byte array 'message' type that is passed between hardware and software.

Software

Amazon provides a simple read+write interface to the FPGA through user space file IO and a kernel driver. This example writes+read 'message' byte arrays to/from file - relatively simple code.

Hardware

Amazon uses an AXI4 bus in their DMA example. This hardware serializes and deserializes 64 byte chunks of AXI4 data to form 'message' byte arrays.

AWS F1 DMA Example

Run the example

How does the example work?

Input and output byte representation

The FPGA 'output = work(input)' function

Software driver/tester

Interface to Amazon's code

Software

Hardware

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally