Model Inference and Deployment

Jump to bottom

Yiming Cui edited this page Apr 17, 2023 · 10 revisions

We mainly provide the following ways for inference and local deployment.

llama.cpp

A tool for quantizing model and deploying on local CPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp-Deployment

🤗Transformers

Original transformers inference method, support CPU/GPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers

text-generation-webui

A tool for deploying model as a web UI.

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/text-generation-webui

LlamaChat

提供了一种基于macOS系统的图形交互界面，支持GGML（.bin格式）和PyTorch（.pth格式）版本的模型加载。

教程：https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用LlamaChat图形界面（macOS）