-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Model Inference and Deployment
Yiming Cui edited this page Apr 17, 2023
·
10 revisions
We mainly provide the following ways for inference and local deployment.
A tool for quantizing model and deploying on local CPU
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp-Deployment
Original transformers inference method, support CPU/GPU
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers
A tool for deploying model as a web UI.
Link: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/text-generation-webui
提供了一种基于macOS系统的图形交互界面,支持GGML(.bin格式)和PyTorch(.pth格式)版本的模型加载。
教程:https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用LlamaChat图形界面(macOS)
- 模型合并与转换
- 模型量化、推理、部署
- 效果与评测
- 训练细节
- 常见问题
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Details
- FAQ