@@ -59,23 +59,23 @@ th:not(:first-child) {
59
59
60
60
## Feature x Hardware
61
61
62
- | Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD |
63
- | -----------------------------------------------------------| --------------------| ----------| ----------| -------| ----------| --------------------| -------|
64
- | [ CP] [ chunked-prefill ] | [ ❌] ( gh-issue:2729 ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
65
- | [ APC] [ automatic-prefix-caching ] | [ ❌] ( gh-issue:3687 ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
66
- | [ LoRA] [ lora-adapter ] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
67
- | <abbr title =" Prompt Adapter " >prmpt adptr</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | [ ❌] ( gh-issue:8475 ) | ✅ |
68
- | [ SD] [ spec-decode ] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
69
- | CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
70
- | <abbr title =" Pooling Models " >pooling</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ |
71
- | <abbr title =" Encoder-Decoder Models " >enc-dec</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
72
- | <abbr title =" Multimodal Inputs " >mm</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
73
- | <abbr title =" Logprobs " >logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
74
- | <abbr title =" Prompt Logprobs " >prmpt logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
75
- | <abbr title =" Async Output Processing " >async output</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
76
- | multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [ ❌] ( gh-issue:8477 ) | ✅ |
77
- | best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
78
- | beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
62
+ | Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | TPU |
63
+ | -----------------------------------------------------------| --------------------- | ----------- | ----------- | -------- | ------------ | --------------------| -------- | -----|
64
+ | [ CP] [ chunked-prefill ] | [ ❌] ( gh-issue:2729 ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
65
+ | [ APC] [ automatic-prefix-caching ] | [ ❌] ( gh-issue:3687 ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
66
+ | [ LoRA] [ lora-adapter ] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
67
+ | <abbr title =" Prompt Adapter " >prmpt adptr</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | [ ❌] ( gh-issue:8475 ) | ✅ | ❌ |
68
+ | [ SD] [ spec-decode ] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
69
+ | CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
70
+ | <abbr title =" Pooling Models " >pooling</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❌ |
71
+ | <abbr title =" Encoder-Decoder Models " >enc-dec</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
72
+ | <abbr title =" Multimodal Inputs " >mm</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
73
+ | <abbr title =" Logprobs " >logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
74
+ | <abbr title =" Prompt Logprobs " >prmpt logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
75
+ | <abbr title =" Async Output Processing " >async output</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
76
+ | multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [ ❌] ( gh-issue:8477 ) | ✅ | ❌ |
77
+ | best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
78
+ | beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
79
79
80
80
!!! note
81
81
Please refer to [ Feature support through NxD Inference backend] [ feature-support-through-nxd-inference-backend ] for features supported on AWS Neuron hardware
0 commit comments