GitHub

This repo consists of guidelines to use Triton inference server using TensorRT/TensorRT-LLM

Note:

For Solar models following Llama2 helps
In case "rope_scaling" error pops up, upgrade the transformers to >=4.43 or 4.45.1
For multi-gpu, check https://github.com/DeekshithaDPrakash/Triton_server_TRT/blob/eb48ab9350c5d9f0440b359b8ffc7436ce9da391/Llama3/tritonserver_guide_llama3.1_multigpu.md It's necessary to use tp_size during convert_checkpoint which creates 2 rank safetensors checkpoints and world_size during lauch_triton_server.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Langchain_graph_intergration		Langchain_graph_intergration
Llama2		Llama2
Llama3		Llama3
RoBERTa		RoBERTa
triton_metrics_port		triton_metrics_port
README.md		README.md

Provide feedback