Skip to content

Model Inference and Deployment

Yiming Cui edited this page Apr 17, 2023 · 10 revisions

We mainly provide the following ways for inference and local deployment.

llama.cpp

A tool for quantizing model and deploying on local CPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp-Deployment

🤗Transformers

Original transformers inference method, support CPU/GPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers

text-generation-webui

A tool for deploying model as a web UI.

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/text-generation-webui

LlamaChat

提供了一种基于macOS系统的图形交互界面,支持GGML(.bin格式)和PyTorch(.pth格式)版本的模型加载。

教程:https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用LlamaChat图形界面(macOS)

Clone this wiki locally