Model Inference and Deployment

Jump to bottom

Yiming Cui edited this page Apr 15, 2023 · 10 revisions

We mainly provide the following three ways for inference and local deployment.

llama.cpp

A tool for quantizing model and deploying on local CPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp-Deployment

🤗Transformers

Original transformers inference method, support CPU/GPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers

text-generation-webui

A tool for deploying model as a web UI.

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/text-generation-webui