-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Model Inference and Deployment
Yiming Cui edited this page Apr 17, 2023
·
10 revisions
We mainly provide the following ways for inference and local deployment.
A tool for quantizing model and deploying on local CPU
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp-Deployment
Original transformers inference method, support CPU/GPU
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers
A tool for deploying model as a web UI.
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/text-generation-webui
LlamaChat is a macOS app that allows you to chat with LLaMA, Alpaca, etc. Support GGML(.bin) and PyTorch (.pth) formats。
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Using-LlamaChat-Interface
- 模型合并与转换
- 模型量化、推理、部署
- 效果与评测
- 训练细节
- 常见问题
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Details
- FAQ