Model Inference and Deployment

Jump to bottom

Yiming Cui edited this page Apr 17, 2023 · 10 revisions

We mainly provide the following ways for inference and local deployment.

llama.cpp

A tool for quantizing model and deploying on local CPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp-Deployment

🤗Transformers

Original transformers inference method, support CPU/GPU

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers

text-generation-webui

A tool for deploying model as a web UI.

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/text-generation-webui

LlamaChat

LlamaChat is a macOS app that allows you to chat with LLaMA, Alpaca, etc. Support GGML(.bin) and PyTorch (.pth) formats。

Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Using-LlamaChat-Interface