Skip to content

Commit b205e84

Browse files
authored
[Doc][TPU] Add models and features supporting matrix. (#20230)
Signed-off-by: Qiliang Cui <cuiq@google.com>
1 parent be0cfb2 commit b205e84

File tree

3 files changed

+54
-17
lines changed

3 files changed

+54
-17
lines changed

docs/.nav.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ nav:
3939
- models/generative_models.md
4040
- models/pooling_models.md
4141
- models/extensions
42+
- Hardware Supported Models: models/hardware_supported_models
4243
- Features:
4344
- features/compatibility_matrix.md
4445
- features/*

docs/features/compatibility_matrix.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -59,23 +59,23 @@ th:not(:first-child) {
5959

6060
## Feature x Hardware
6161

62-
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD |
63-
|-----------------------------------------------------------|--------------------|----------|----------|-------|----------|--------------------|-------|
64-
| [CP][chunked-prefill] | [](gh-issue:2729) |||||||
65-
| [APC][automatic-prefix-caching] | [](gh-issue:3687) |||||||
66-
| [LoRA][lora-adapter] ||||||||
67-
| <abbr title="Prompt Adapter">prmpt adptr</abbr> |||||| [](gh-issue:8475) ||
68-
| [SD][spec-decode] ||||||||
69-
| CUDA graph ||||||||
70-
| <abbr title="Pooling Models">pooling</abbr> ||||||||
71-
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> ||||||||
72-
| <abbr title="Multimodal Inputs">mm</abbr> ||||||||
73-
| <abbr title="Logprobs">logP</abbr> ||||||||
74-
| <abbr title="Prompt Logprobs">prmpt logP</abbr> ||||||||
75-
| <abbr title="Async Output Processing">async output</abbr> ||||||||
76-
| multi-step |||||| [](gh-issue:8477) ||
77-
| best-of ||||||||
78-
| beam-search ||||||||
62+
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | TPU |
63+
|-----------------------------------------------------------|---------------------|-----------|-----------|--------|------------|--------------------|--------|-----|
64+
| [CP][chunked-prefill] | [](gh-issue:2729) ||||||||
65+
| [APC][automatic-prefix-caching] | [](gh-issue:3687) ||||||||
66+
| [LoRA][lora-adapter] |||||||||
67+
| <abbr title="Prompt Adapter">prmpt adptr</abbr> |||||| [](gh-issue:8475) |||
68+
| [SD][spec-decode] |||||||||
69+
| CUDA graph |||||||||
70+
| <abbr title="Pooling Models">pooling</abbr> |||||||||
71+
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> |||||||||
72+
| <abbr title="Multimodal Inputs">mm</abbr> |||||||||
73+
| <abbr title="Logprobs">logP</abbr> |||||||||
74+
| <abbr title="Prompt Logprobs">prmpt logP</abbr> |||||||||
75+
| <abbr title="Async Output Processing">async output</abbr> |||||||||
76+
| multi-step |||||| [](gh-issue:8477) |||
77+
| best-of |||||||||
78+
| beam-search |||||||||
7979

8080
!!! note
8181
Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: TPU
3+
---
4+
[](){ #tpu-supported-models }
5+
6+
# TPU Supported Models
7+
## Text-only Language Models
8+
9+
| Model | Architecture | Supported |
10+
|-----------------------------------------------------|--------------------------------|-----------|
11+
| mistralai/Mixtral-8x7B-Instruct-v0.1 | MixtralForCausalLM | 🟨 |
12+
| mistralai/Mistral-Small-24B-Instruct-2501 | MistralForCausalLM ||
13+
| mistralai/Codestral-22B-v0.1 | MistralForCausalLM ||
14+
| mistralai/Mixtral-8x22B-Instruct-v0.1 | MixtralForCausalLM ||
15+
| meta-llama/Llama-3.3-70B-Instruct | LlamaForCausalLM ||
16+
| meta-llama/Llama-3.1-8B-Instruct | LlamaForCausalLM ||
17+
| meta-llama/Llama-3.1-70B-Instruct | LlamaForCausalLM ||
18+
| meta-llama/Llama-4-* | Llama4ForConditionalGeneration ||
19+
| microsoft/Phi-3-mini-128k-instruct | Phi3ForCausalLM | 🟨 |
20+
| microsoft/phi-4 | Phi3ForCausalLM ||
21+
| google/gemma-3-27b-it | Gemma3ForConditionalGeneration | 🟨 |
22+
| google/gemma-3-4b-it | Gemma3ForConditionalGeneration ||
23+
| deepseek-ai/DeepSeek-R1 | DeepseekV3ForCausalLM ||
24+
| deepseek-ai/DeepSeek-V3 | DeepseekV3ForCausalLM ||
25+
| RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 | LlamaForCausalLM ||
26+
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 | LlamaForCausalLM ||
27+
| Qwen/Qwen3-8B | Qwen3ForCausalLM ||
28+
| Qwen/Qwen3-32B | Qwen3ForCausalLM ||
29+
| Qwen/Qwen2.5-7B-Instruct | Qwen2ForCausalLM ||
30+
| Qwen/Qwen2.5-32B | Qwen2ForCausalLM ||
31+
| Qwen/Qwen2.5-14B-Instruct | Qwen2ForCausalLM ||
32+
| Qwen/Qwen2.5-1.5B-Instruct | Qwen2ForCausalLM | 🟨 |
33+
34+
✅ Runs and optimized.
35+
🟨 Runs and correct but not optimized to green yet.
36+
❌ Does not pass accuracy test or does not run.

0 commit comments

Comments
 (0)