Skip to content

Commit 0100df2

Browse files
committed
update readme.md
1 parent 1d4d9be commit 0100df2

File tree

3 files changed

+52
-26
lines changed

3 files changed

+52
-26
lines changed

WebUI/configs/webuiconfig.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -712,7 +712,7 @@
712712
"Huggingface": "Qwen/Qwen-VL-Chat-Int4"
713713
},
714714
"stable-video-diffusion-img2vid": {
715-
"path": "models/imagegeneration/stable-video-diffusion-img2vid",
715+
"path": "models/multimodal/image-chat/stable-video-diffusion-img2vid",
716716
"device": "auto",
717717
"maxmemory": 24,
718718
"cputhreads": 4,
@@ -722,7 +722,7 @@
722722
"Huggingface": "stabilityai/stable-video-diffusion-img2vid"
723723
},
724724
"stable-video-diffusion-img2vid-xt": {
725-
"path": "models/imagegeneration/stable-video-diffusion-img2vid-xt",
725+
"path": "models/multimodal/image-chat/stable-video-diffusion-img2vid-xt",
726726
"device": "auto",
727727
"maxmemory": 24,
728728
"cputhreads": 4,

readme-cn.md

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -134,24 +134,24 @@
134134
conda create -n keras-llm-robot python==3.11.5
135135
```
136136

137-
1. 拉取仓库
137+
3. 拉取仓库
138138
```bash
139139
git clone https://github.com/smalltong02/keras-llm-robot.git
140140
cd keras-llm-robot
141141
```
142142

143-
1. 激活虚拟环境
143+
4. 激活虚拟环境
144144
```bash
145145
conda activate keras-llm-robot
146146
```
147147

148-
1. 如果拥有NVIDIA GPU,请首先安装CUDA Toolkit (https://developer.nvidia.com/cuda-toolkit-archive) ,并在虚拟环境中安装pytorch CUDA版本 (版本号请和CUDA Toolkit版本相同 https://pytorch.org/)
148+
5. 如果拥有NVIDIA GPU,请首先安装CUDA Toolkit (https://developer.nvidia.com/cuda-toolkit-archive) ,并在虚拟环境中安装pytorch CUDA版本 (版本号请和CUDA Toolkit版本相同 https://pytorch.org/)
149149
```bash
150150
// 例如安装12.1版本
151151
conda install pytorch=2.1.2 torchvision=0.16.2 torchaudio=2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
152152
```
153153

154-
1. 安装依赖项, 请按照不同平台选择适当的requirements
154+
6. 安装依赖项, 请按照不同平台选择适当的requirements
155155
```bash
156156
// windows平台, 安装过程中如果遇到llama-cpp-python和tts的编译错误,请将这两个安装包从requirements中删除掉,但是删除这2个包之后,将失去
157157
// 无法加载本地语音模型XTTS-2以及无法加载GGUF的量化模型。
@@ -162,7 +162,7 @@
162162
pip install -r requirements-macos.txt
163163
```
164164

165-
1. 如果需要支持语音功能,还需要安装ffmpeg工具
165+
7. 如果需要支持语音功能,还需要安装ffmpeg工具
166166

167167
// windows平台
168168

@@ -185,7 +185,7 @@
185185
brew install ffmpeg
186186
```
187187

188-
2. 如果需要从Huggingface上下载模型到本地离线运行,请自行下载模型之后,放入到"models"目录中。如果没有提前下载模型,程序会自动从Huggingface网站上下载到本地的系统缓存中。
188+
8. 如果需要从Huggingface上下载模型到本地离线运行,请自行下载模型之后,放入到"models"目录中。如果没有提前下载模型,程序会自动从Huggingface网站上下载到本地的系统缓存中。
189189
```bash
190190
// 比如llama-2-7b-chat语言模型的目录是
191191
models\llm\Llama-2-7b-chat-hf
@@ -197,12 +197,26 @@
197197
models\voices\faster-whisper-large-v3
198198
```
199199

200-
8. 如果仅想在本地进行部署,可以使用python启动WebUI,http://127.0.0.1:8818
200+
9. 在使用`OpenDalleV1.1`模型生成图片时,如果使用16位精度使用模型,请先从Huggingface上下载`sdxl-vae-fp16-fix`模型并放入`models\imagegeneration`文件夹中。 如果开启Refiner,请先从Huggingface上下载`stable-diffusion-xl-refiner-1.0`模型并放入`models\imagegeneration`文件夹中。
201+
202+
10. 在使用`stable-video-diffusion-img2vid`或者`stable-video-diffusion-img2vid-xt`模型生成视频时:
203+
204+
需要先安装ffmpeg和对应的依赖包:
205+
206+
```bash
207+
1. download generative-models from https://github.com/Stability-AI/generative-models in project root folder.
208+
2. cd generative-models & pip install .
209+
3. pip install pytorch-lightning
210+
pip install kornia
211+
pip install open_clip_torch
212+
```
213+
214+
11. 如果仅想在本地进行部署,可以使用python启动WebUI,http://127.0.0.1:8818
201215
```bash
202216
python __webgui_server__.py --webui
203217
```
204218

205-
9. 如果需要在云服务器上部署,并在本地访问WebUI,请使用反向代理,并以HTTPS协议启动WebUI。在本地请使用https://127.0.0.1:4480 打开WebUI,在远端使用 https://[server ip]:4480 打开WebUI。
219+
12. 如果需要在云服务器上部署,并在本地访问WebUI,请使用反向代理,并以HTTPS协议启动WebUI。在本地请使用https://127.0.0.1:4480 打开WebUI,在远端使用 https://[server ip]:4480 打开WebUI。
206220
```bash
207221
// 批处理内部默认使用的虚拟环境是 keras-llm-robot,如果想使用其它的虚拟环境名称,请自行修改批处理文件
208222
webui-startup-windows.bat
@@ -334,9 +348,9 @@
334348

335349
`多模态模型的特殊说明`
336350

337-
- cogvlm-chat-hf, Qwen-VL-Chat, Qwen-VL-Chat-Int4支持单张图片文件加文字输入,可以识别图片内容,并根据自然语言来回答关于图片的问题。
351+
- `cogvlm-chat-hf`, `Qwen-VL-Chat`, `Qwen-VL-Chat-Int4`支持单张图片文件加文字输入,可以识别图片内容,并根据自然语言来回答关于图片的问题。
338352

339-
- stable-video-diffusion-img2vid, stable-video-diffusion-img2vid-xt支持单张图片文件输入, 并且根据图片生成视频。
353+
- `stable-video-diffusion-img2vid`, `stable-video-diffusion-img2vid-xt`支持单张图片文件输入, 并且根据图片生成视频。
340354

341355
在使用这两个模型时,需要先安装ffmpeg和对应的依赖包:
342356

@@ -348,7 +362,7 @@
348362
pip install open_clip_torch
349363
```
350364

351-
- Qwen-Audio-Chat支持单个语音文件加文字输入,并根据自然语言来回答语音文件中的内容。
365+
- `Qwen-Audio-Chat`支持单个语音文件加文字输入,并根据自然语言来回答语音文件中的内容。
352366

353367

354368
2. **`模型量化`**
@@ -473,7 +487,7 @@
473487
| blip-image-captioning-large | Image Recognition Model | *B |
474488
| OpenDalleV1.1 | Image Generation Model | *B |
475489

476-
在使用OpenDalleV1.1模型生成图片时,如果使用16位精度使用模型,请先从Huggingface上下载sdxl-vae-fp16-fix模型并放入models\imagegeneration文件夹中。 如果开启Refiner,请先从Huggingface上下载stable-diffusion-xl-refiner-1.0模型并放入models\imagegeneration文件夹中
490+
在使用`OpenDalleV1.1`模型生成图片时,如果使用16位精度使用模型,请先从Huggingface上下载`sdxl-vae-fp16-fix`模型并放入`models\imagegeneration`文件夹中。 如果开启Refiner,请先从Huggingface上下载`stable-diffusion-xl-refiner-1.0`模型并放入`models\imagegeneration`文件夹中
477491

478492
图像识别的演示:
479493

readme.md

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -137,24 +137,24 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
137137
conda create -n keras-llm-robot python==3.11.5
138138
```
139139
140-
1. Clone the repository:
140+
3. Clone the repository:
141141
```bash
142142
git clone https://github.com/smalltong02/keras-llm-robot.git
143143
cd keras-llm-robot
144144
```
145145
146-
1. Activate the virtual environment:
146+
4. Activate the virtual environment:
147147
```bash
148148
conda activate keras-llm-robot
149149
```
150150
151-
1. If you have an NVIDIA GPU, Please install the CUDA Toolkit from (https://developer.nvidia.com/cuda-toolkit-archive), and install the PyTorch CUDA version in the virtual environment (same to the CUDA Toolkit version https://pytorch.org/):
151+
5. If you have an NVIDIA GPU, Please install the CUDA Toolkit from (https://developer.nvidia.com/cuda-toolkit-archive), and install the PyTorch CUDA version in the virtual environment (same to the CUDA Toolkit version https://pytorch.org/):
152152
```bash
153153
// such as install version 12.1
154154
conda install pytorch=2.1.2 torchvision=0.16.2 torchaudio=2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
155155
```
156156
157-
1. Install dependencies, Please choose the appropriate requirements file based on your platform, On the Windows, if encounter compilation errors for llama-cpp-python or tts during the installation, please remove these two packages from the requirements:
157+
6. Install dependencies, Please choose the appropriate requirements file based on your platform, On the Windows, if encounter compilation errors for llama-cpp-python or tts during the installation, please remove these two packages from the requirements:
158158
```bash
159159
// windows
160160
pip install -r requirements-windows.txt
@@ -164,7 +164,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
164164
pip install -r requirements-macos.txt
165165
```
166166
167-
1. If speech feature is required, you also need to install the ffmpeg tool.
167+
7. If speech feature is required, you also need to install the ffmpeg tool.
168168
169169
// For Windows:
170170
Download the Windows binary package of ffmpeg from (https://www.gyan.dev/ffmpeg/builds/).
@@ -185,7 +185,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
185185
brew install ffmpeg
186186
```
187187
188-
2. If you need to download models from Hugging Face for offline execution, please download the models yourself and place them in the "models" directory. If the models have not been downloaded in advance, the WebUI will automatically download them from the Hugging Face website to the local system cache.
188+
8. If you need to download models from Hugging Face for offline execution, please download the models yourself and place them in the "models" directory. If the models have not been downloaded in advance, the WebUI will automatically download them from the Hugging Face website to the local system cache.
189189
```bash
190190
// such as the folder of llama-2-7b-chat model:
191191
models\llm\Llama-2-7b-chat-hf
@@ -197,12 +197,24 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
197197
models\voices\faster-whisper-large-v3
198198
```
199199
200-
9. If run locally, start the Web UI using Python at http://127.0.0.1:8818:
200+
9. When using the `OpenDalleV1.1` model to generate images, if using 16-bit precision, please download the `sdxl-vae-fp16-fix` model from Huggingface and place it in the `models\imagegeneration` folder. If enabling the Refiner, please download the `stable-diffusion-xl-refiner-1.0` model from Huggingface and place it in the `models\imagegeneration` folder beforehand.
201+
202+
10. When using the Model `stable-video-diffusion-img2vid` and `stable-video-diffusion-img2vid-xt`, it is necessary to install ffmpeg and the corresponding dependencies first:
203+
204+
```bash
205+
1. download generative-models from https://github.com/Stability-AI/generative-models in project root folder.
206+
2. cd generative-models & pip install .
207+
3. pip install pytorch-lightning
208+
pip install kornia
209+
pip install open_clip_torch
210+
```
211+
212+
11. If run locally, start the Web UI using Python at http://127.0.0.1:8818:
201213
```bash
202214
python __webgui_server__.py --webui
203215
```
204216
205-
10. If deploying on a cloud server and accessing the Web UI locally, use reverse proxy and start the Web UI with HTTPS. Access using https://127.0.0.1:4480 on locally, and use the https interface at https://[server ip]:4480 on remotely:
217+
12. If deploying on a cloud server and accessing the Web UI locally, use reverse proxy and start the Web UI with HTTPS. Access using https://127.0.0.1:4480 on locally, and use the https interface at https://[server ip]:4480 on remotely:
206218
```bash
207219
// By default, the batch file uses the virtual environment named keras-llm-robot,
208220
// Modify the batch file if using a different virtual environment name.
@@ -335,9 +347,9 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
335347
336348
`Notes for Multimodal Models`
337349
338-
- The Model cogvlm-chat-hf, Qwen-VL-Chat, and Qwen-VL-Chat-Int4 support single-image file input with text input, capable of recognizing image content and answering questions about the image based on natural language.
350+
- The Model `cogvlm-chat-hf`, `Qwen-VL-Chat`, and `Qwen-VL-Chat-Int4` support single-image file input with text input, capable of recognizing image content and answering questions about the image based on natural language.
339351
340-
- The Model stable-video-diffusion-img2vid and stable-video-diffusion-img2vid-xt support single-image file input and generate video based on the image.
352+
- The Model `stable-video-diffusion-img2vid` and `stable-video-diffusion-img2vid-xt` support single-image file input and generate video based on the image.
341353
342354
When using these two models, it is necessary to install ffmpeg and the corresponding dependencies first:
343355
@@ -349,7 +361,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
349361
pip install open_clip_torch
350362
```
351363
352-
- The Model Qwen-Audio-Chat supports single audio file input with text input and provides responses to the content of the audio file based on natural language.
364+
- The Model `Qwen-Audio-Chat` supports single audio file input with text input and provides responses to the content of the audio file based on natural language.
353365
354366
2. **`Quantization`**
355367
@@ -473,7 +485,7 @@ Auxiliary models, such as retrieval, code execution, text-to-speech, speech-to-t
473485
| blip-image-captioning-large | Image Recognition Model | *B |
474486
| OpenDalleV1.1 | Image Generation Model | *B |
475487
476-
When using the OpenDalleV1.1 model to generate images, if using 16-bit precision, please download the sdxl-vae-fp16-fix model from Huggingface and place it in the models\imagegeneration folder. If enabling the Refiner, please download the stable-diffusion-xl-refiner-1.0 model from Huggingface and place it in the models\imagegeneration folder beforehand.
488+
When using the `OpenDalleV1.1` model to generate images, if using 16-bit precision, please download the `sdxl-vae-fp16-fix` model from Huggingface and place it in the `models\imagegeneration` folder. If enabling the Refiner, please download the `stable-diffusion-xl-refiner-1.0` model from Huggingface and place it in the `models\imagegeneration` folder beforehand.
477489
478490
Image Recognition:
479491

0 commit comments

Comments
 (0)