sglang

Star

Here are 28 public repositories matching this topic...

kvcache-ai / Mooncake

Star

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference rdma disaggregation llm vllm sglang kvcache

Updated Aug 9, 2025
C++

ModelCloud / GPTQModel

Star

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated Aug 6, 2025
Python

HuiResearch / FlashTTS

Star

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

vllm sglang llamacpp-python sparktts spark-tts orpheus-tts megatts3 flashtts

Updated May 18, 2025
Python

sgl-project / SpecForge

Star

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

training eagle pytorch llm fsdp sglang eagle3

Updated Aug 9, 2025
Python

InftyAI / llmaz

Star

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

kubernetes inference huggingface llm modelscope llamacpp vllm text-generation-inference ollama sglang inference-platform

Updated Jul 28, 2025
Go

sgl-project / ome

Star

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

k8s llama oracle-cloud model-serving model-as-a-service multi-node-kubernetes llm llm-inference deepseek sglang kimi-k2 sgalng

Updated Aug 9, 2025
Go

shell-nlp / gpt_server

Star

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。

tts openai llama gpt infinity embedding asr text-moderation llm prompt-injection vllm fastchat function-calling rerank sglang lmdeploy

Updated Aug 9, 2025
Python

ovg-project / kvcached

Star

kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.

inference-engine llm vllm sglang kvcache gpu-sharing kvcached gpu-mutiplexing

Updated Aug 8, 2025
Python

scitix / arks

Star

Arks is a cloud-native inference framework running on Kubernetes

kubernetes ai inference dynamo reasoning cloudnative-services llm vllm sglang scitix

Updated Jul 29, 2025
Go

modal-labs / stopwatch

Star

A tool for benchmarking LLMs on Modal

machine-learning llms vllm tensorrt-llm sglang

Updated Jul 30, 2025
Python

NEOS-AI / Neosearch

Star

AI-based search done right

information-retrieval rag searxng llm cloudnativepg ray-serve vllm retrieval-augmented-generation paradedb llm-search sglang flashrank-reranking postgres-search investment-agent

Updated Aug 8, 2025
TypeScript

AidanCooper / constrained-decoding

Star

A guide to structured generation using constrained decoding

nlp generative-model constrained-decoding large-language-models structured-generation sglang

Updated Jun 9, 2024
Jupyter Notebook

sgl-project / whl

Star

Kernel Library Wheel for SGLang

cuda cutlass sglang flashinfer

Updated Aug 8, 2025
HTML

dzhsurf / deepseek-v3-r1-deploy-and-benchmarks

Star

DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks

throughput-performance vllm h100 sglang deepseek-v3 deepseek-r1 deepseek-v3-awq deepseek-r1-awq

Updated Mar 13, 2025
Python

lucasavila00 / LmScript

Star

Controllable Language Model Interactions in TypeScript

typescript ai guidance outlines llm sglang

Updated May 17, 2024
TypeScript

didier-durand / llms-in-clouds

Star

Experiments with LLMs in clouds (powered by SGLang)

docker aws aiml llama granite mistral huggingface llm qwen sglang wan-ai

Updated Aug 9, 2025
Python

ugo-emekauwa / private-ai-setup-dream-guide

Star

The Private AI Setup Dream Guide for Demos automates the installation of the software needed for a local private AI setup, utilizing AI models (LLMs and diffusion models) for use cases such as general assistance, business ideas, coding, image generation, systems administration, marketing, planning, and more.

Updated Jul 17, 2025
Shell

cjmcv / ai-infra-notes

Star

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

hpc gpu cuda inference simd cutlass heterogeneous-computing mlsys llm sglang

Updated Jul 19, 2025

slinusc / bench360

Star

Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers & practitioners.

benchmark performance framework energy deployment local optimization engine inference quantization energy-consumption tgi llm vllm llm-inference sglang lmdeploy bench360

Updated Aug 3, 2025
Python

Yijia-Z / aide

Star

agent chat-application claude ai-applications openai-api llm chatgpt openrouter llm-interface sglang

Updated Apr 27, 2025
TypeScript

Improve this page

Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sglang

Here are 28 public repositories matching this topic...

kvcache-ai / Mooncake

ModelCloud / GPTQModel

HuiResearch / FlashTTS

sgl-project / SpecForge

InftyAI / llmaz

sgl-project / ome

shell-nlp / gpt_server

ovg-project / kvcached

scitix / arks

modal-labs / stopwatch

NEOS-AI / Neosearch

AidanCooper / constrained-decoding

sgl-project / whl

dzhsurf / deepseek-v3-r1-deploy-and-benchmarks

lucasavila00 / LmScript

didier-durand / llms-in-clouds

ugo-emekauwa / private-ai-setup-dream-guide

cjmcv / ai-infra-notes

slinusc / bench360

Yijia-Z / aide

Improve this page

Add this topic to your repo