Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs

Demo

Prerequesites

Before you start, you should better align with our experiment environment. For Alveo U50lv Card, our environment is

System: Ubuntu 18.04
Shell: xilinx-u50lv-gen3x4-xdma-base_2
Xrt: 2023.2
Vitis HLS & Vivado: 2023.2

Code Structure

template/ 													# template HLS code for generation framework
OPT-1.3b_optimize/ 											# generated code for vitis development flow
LLM-demo-gui/ 												# for webui interaction
OPT-1.3b_optimize/connectivity.cfg  						# configuration file to specify topology
codegen.py 													# script to modify the template according to configuration
OPT-1.3b.json 												# configuration file to specify performance
weight_packer.py 											# script to pack model weight into Terafly memory layout

Quick Start

You can download our packed data (model weight) from here.

cd OPT-1.3b_optimize/
make run

This will automatically generate the xclbin file and program your Alveo card.

cd tokenizer/
sh ./command.sh

This will compile the host-side application to run the lambada benchmark. You may check the tokenizer_predict_eigen.cpp to make sure the code correctly loaded the packed data.

You can also run this code

cd LLM-demo-gui/alveo
(python==3.6) python client-v3.py

Then open the html fileLLM-demo-gui/llm-gui/web/index.html to chat with LLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs

Demo

Prerequesites

Code Structure

Quick Start

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LLM-demo-gui		LLM-demo-gui
OPT-1.3b_optimize		OPT-1.3b_optimize
assets		assets
template		template
LICENSE		LICENSE
OPT-1.3b.json		OPT-1.3b.json
README.md		README.md
codegen.py		codegen.py
weight_packer.py		weight_packer.py

License

zjnyly/TeraFly

Folders and files

Latest commit

History

Repository files navigation

Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs

Demo

Prerequesites

Code Structure

Quick Start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages