shixianc

Follow

shixianc shixianc

Follow

Area of focus: LLM Inference Optimization | Multimodal Training

in/shixian-cui-2814ab187

Achievements

Achievements

shixianc/README.md

I'm Shixian

interested in getting model run faster

Pinned Loading

vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 53.5k 9k
triton-inference-server/server triton-inference-server/server Public

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9.5k 1.6k
NVIDIA/TensorRT-LLM NVIDIA/TensorRT-LLM Public

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11.2k 1.6k
triton-inference-server/model_navigator triton-inference-server/model_navigator Public

Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.

Python 210 27
QwenLM/Qwen2-Audio QwenLM/Qwen2-Audio Public

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1.8k 135
deepseek-ai/DeepGEMM deepseek-ai/DeepGEMM Public

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

C++ 5.6k 662