NYCU Edge AI Final: SGLang

This project is for the NYCU Edge AI final, focusing on LLM quantization and performance evaluation with SGLang server.

Download Repo

git clone https://github.com/KeithLin724/NYCU_Edge_AI_SGLang.git

Environment Setup

Make sure you have Conda installed. Create the environment with:

conda env create -f environment.yml
conda activate edge_ai_sglang_stable

CUDA & NVCC Installation

To install CUDA Toolkit (includes nvcc) on Ubuntu 22.04, run:

# Download the CUDA repository pin file for package priority
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

# Download and install the CUDA repository local installer
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda-repo-ubuntu2204-12-9-local_12.9.0-575.51.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-9-local_12.9.0-575.51.03-1_amd64.deb

# Add the CUDA GPG key to your system keyring
sudo cp /var/cuda-repo-ubuntu2204-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/

# Update package lists
sudo apt-get update

# Install CUDA Toolkit 12.9 (includes nvcc)
sudo apt-get -y install cuda-toolkit-12-9

How to Run Experiments

1. Prepare the Model

Option 1: Download Pre-built Model

You can quickly get started by using the pre-built, quantized model.
Simply run the following command to automatically download and load the stable model for your experiments:

sh run_server.sh

Option 2: Build the Model from Scratch

If you prefer to build the model yourself (e.g., for custom training or quantization),
please refer to the detailed instructions in BUILD_MODEL.md.

2. Throughput Test

Start the SG-Lang server:

# Start the SG-Lang server with the default pre-built model (KYLiN724/llama-3.2-1b-KD-V1-W8A8-Dynamic-Per-Token)
sh run_server.sh

# Or, specify a custom model path or Hugging Face repo
# sh run_server.sh <model_name_or_path>

Run the throughput test script:

# Run throughput test with the default pre-built model
python result-quant-sglang.py

# Or, specify a custom model path or Hugging Face repo
# python result-quant-sglang.py --model_name <model_name_or_path>

3. Perplexity (PPL) Test

Note: Please shut down the SG-Lang server before running this step.

# Run perplexity (PPL) test with the default pre-built model
python result-quant.py

# Or, specify a custom model path or Hugging Face repo
# python result-quant.py --model_name <model_name_or_path>

Notes

Experiment results will be saved as result_tput.csv (throughput) and result_ppl.csv (perplexity).
You can adjust model or dataset parameters at the top of each script to suit your needs.
If you encounter CUDA out-of-memory errors:
- Try reducing the batch size or sequence length.
- You can also tune server flags in run_server.sh for better memory management. See the SG-Lang hyperparameter tuning guide for more details.

For any questions, please open an issue or contact the project maintainer.

Model

meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-3.2-1B-Instruct

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
build_small_model		build_small_model
results		results
.gitignore		.gitignore
BUILD_MODEL.md		BUILD_MODEL.md
LICENSE		LICENSE
README.md		README.md
SGLangModel.py		SGLangModel.py
compress_to_int8.py		compress_to_int8.py
environment.yml		environment.yml
get_preprocess_model.py		get_preprocess_model.py
result-quant-sglang.py		result-quant-sglang.py
result-quant.py		result-quant.py
result.py		result.py
run_server.sh		run_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NYCU Edge AI Final: SGLang

Download Repo

Environment Setup

CUDA & NVCC Installation

How to Run Experiments

1. Prepare the Model

Option 1: Download Pre-built Model

Option 2: Build the Model from Scratch

2. Throughput Test

3. Perplexity (PPL) Test

Notes

Model

Reference

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

KeithLin724/NYCU_Edge_AI_SGLang

Folders and files

Latest commit

History

Repository files navigation

NYCU Edge AI Final: SGLang

Download Repo

Environment Setup

CUDA & NVCC Installation

How to Run Experiments

1. Prepare the Model

Option 1: Download Pre-built Model

Option 2: Build the Model from Scratch

2. Throughput Test

3. Perplexity (PPL) Test

Notes

Model

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages