TensorRT-LLM released #3658
Dampfinchen
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
There are a few areas that I think could still improve the performance of the CUDA backend significantly, especially in prompt or batch processing:
I don't think that TensorRT is likely to help with these issues. Additionally, in general we try to avoid adding large dependencies to llama.cpp. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
https://www.tomshardware.com/news/nvidia-boosts-ai-performance-with-tensorrt
Could TensorRT-LLM be useful for CUDA acceleration? @slaren @JohannesGaessler
Beta Was this translation helpful? Give feedback.
All reactions