AI Engineer World's Fair, June 3, 2025 | Yineng Zhang and Philip Kiely
Welcome to the SGLang workshop at AI Engineer World's Fair! We are very excited to have you here and to spend some time this morning talking about model performance optimization with SGLang.
In this hands-on workshop, you'll have the ability to deploy models with SGLang yourself. To follow along, please complete the following steps:
- Fork and clone this repository.
- Create a Baseten account.
- Baseten will provide compute credits for this workshop.
- If you're stuck in a "waiting room" for more than a couple of minutes, please flag Philip.
- Install Truss with
pip install --upgrade truss
These workshop examples are based on Llama 3.1 8B.
- Accept the terms and conditions for Llama 3.1 8B.
- Create an access token with
READ
permissions on Hugging Face. - Add it as a secret on your Baseten account with the name
hf_access_token
.
Create a file ~/.trussrc
and paste in the following (using your actual API key):
[baseten]
remote_provider = baseten
api_key = abcdefgh.1234567890ABCDEFGHIJKL1234567890
remote_url = https://app.baseten.co
Add your API key to your environment variables in your profile of choice:
export BASETEN_API_KEY=abcdefgh.1234567890ABCDEFGHIJKL1234567890
You are now ready to complete the workshop.
Each folder has an example SGLang configuration along with instructions for deployment.
Use call.ipynb
to call individual deployments. Models deployed with SGLang are compatible
with the OpenAI SDK -- just pass your model ID and API key.
- SGLang GitHub
- SGLang documentation
- SGLang contribution guide
- SGLang code architecture (foundation)
- SGLang code walkthrough (advanced)