You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. If you have an NVIDIA GPU, Please install the CUDA Toolkit from (https://developer.nvidia.com/cuda-toolkit-archive), and install the PyTorch CUDA version in the virtual environment (same to the CUDA Toolkit version https://pytorch.org/):
151
+
5. If you have an NVIDIA GPU, Please install the CUDA Toolkit from (https://developer.nvidia.com/cuda-toolkit-archive), and install the PyTorch CUDA version in the virtual environment (same to the CUDA Toolkit version https://pytorch.org/):
1. Install dependencies, Please choose the appropriate requirements file based on your platform, On the Windows, if encounter compilation errors for llama-cpp-python or tts during the installation, please remove these two packages from the requirements:
157
+
6. Install dependencies, Please choose the appropriate requirements file based on your platform, On the Windows, if encounter compilation errors for llama-cpp-python or tts during the installation, please remove these two packages from the requirements:
158
158
```bash
159
159
// windows
160
160
pip install -r requirements-windows.txt
@@ -164,7 +164,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
164
164
pip install -r requirements-macos.txt
165
165
```
166
166
167
-
1. If speech feature is required, you also need to install the ffmpeg tool.
167
+
7. If speech feature is required, you also need to install the ffmpeg tool.
168
168
169
169
// For Windows:
170
170
Download the Windows binary package of ffmpeg from (https://www.gyan.dev/ffmpeg/builds/).
@@ -185,7 +185,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
185
185
brew install ffmpeg
186
186
```
187
187
188
-
2. If you need to download models from Hugging Face foroffline execution, please download the models yourself and place themin the "models" directory. If the models have not been downloaded in advance, the WebUI will automatically download them from the Hugging Face website to the local system cache.
188
+
8. If you need to download models from Hugging Face foroffline execution, please download the models yourself and place themin the "models" directory. If the models have not been downloaded in advance, the WebUI will automatically download them from the Hugging Face website to the local system cache.
189
189
```bash
190
190
// such as the folder of llama-2-7b-chat model:
191
191
models\llm\Llama-2-7b-chat-hf
@@ -197,12 +197,24 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
197
197
models\voices\faster-whisper-large-v3
198
198
```
199
199
200
-
9. If run locally, start the Web UI using Python at http://127.0.0.1:8818:
200
+
9. When using the `OpenDalleV1.1` model to generate images, if using 16-bit precision, please download the `sdxl-vae-fp16-fix` model from Huggingface and place it in the `models\imagegeneration` folder. If enabling the Refiner, please download the `stable-diffusion-xl-refiner-1.0` model from Huggingface and place it in the `models\imagegeneration` folder beforehand.
201
+
202
+
10. When using the Model `stable-video-diffusion-img2vid` and `stable-video-diffusion-img2vid-xt`, it is necessary to install ffmpeg and the corresponding dependencies first:
203
+
204
+
```bash
205
+
1. download generative-models from https://github.com/Stability-AI/generative-models in project root folder.
206
+
2. cd generative-models & pip install .
207
+
3. pip install pytorch-lightning
208
+
pip install kornia
209
+
pip install open_clip_torch
210
+
```
211
+
212
+
11. If run locally, start the Web UI using Python at http://127.0.0.1:8818:
201
213
```bash
202
214
python __webgui_server__.py --webui
203
215
```
204
216
205
-
10. If deploying on a cloud server and accessing the Web UI locally, use reverse proxy and start the Web UI with HTTPS. Access using https://127.0.0.1:4480 on locally, and use the https interface at https://[server ip]:4480 on remotely:
217
+
12. If deploying on a cloud server and accessing the Web UI locally, use reverse proxy and start the Web UI with HTTPS. Access using https://127.0.0.1:4480 on locally, and use the https interface at https://[server ip]:4480 on remotely:
206
218
```bash
207
219
// By default, the batch file uses the virtual environment named keras-llm-robot,
208
220
// Modify the batch file if using a different virtual environment name.
@@ -335,9 +347,9 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
335
347
336
348
`Notes for Multimodal Models`
337
349
338
-
- The Model cogvlm-chat-hf, Qwen-VL-Chat, and Qwen-VL-Chat-Int4 support single-image file input with text input, capable of recognizing image content and answering questions about the image based on natural language.
350
+
- The Model `cogvlm-chat-hf`, `Qwen-VL-Chat`, and `Qwen-VL-Chat-Int4` support single-image file input with text input, capable of recognizing image content and answering questions about the image based on natural language.
339
351
340
-
- The Model stable-video-diffusion-img2vid and stable-video-diffusion-img2vid-xt support single-image file input and generate video based on the image.
352
+
- The Model `stable-video-diffusion-img2vid` and `stable-video-diffusion-img2vid-xt` support single-image file input and generate video based on the image.
341
353
342
354
When using these two models, it is necessary to install ffmpeg and the corresponding dependencies first:
343
355
@@ -349,7 +361,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
349
361
pip install open_clip_torch
350
362
```
351
363
352
-
- The Model Qwen-Audio-Chat supports single audio file input with text input and provides responses to the content of the audio file based on natural language.
364
+
- The Model `Qwen-Audio-Chat` supports single audio file input with text input and provides responses to the content of the audio file based on natural language.
353
365
354
366
2. **`Quantization`**
355
367
@@ -473,7 +485,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
473
485
| blip-image-captioning-large | Image Recognition Model | *B |
474
486
| OpenDalleV1.1 | Image Generation Model | *B |
475
487
476
-
When using the OpenDalleV1.1 model to generate images, if using 16-bit precision, please download the sdxl-vae-fp16-fix model from Huggingface and place it in the models\imagegeneration folder. If enabling the Refiner, please download the stable-diffusion-xl-refiner-1.0 model from Huggingface and place it in the models\imagegeneration folder beforehand.
488
+
When using the `OpenDalleV1.1` model to generate images, if using 16-bit precision, please download the `sdxl-vae-fp16-fix` model from Huggingface and place it in the `models\imagegeneration` folder. If enabling the Refiner, please download the `stable-diffusion-xl-refiner-1.0` model from Huggingface and place it in the `models\imagegeneration` folder beforehand.
0 commit comments