Skip to content

Commit 963da5f

Browse files
taronaeoqnixsynapse
authored andcommitted
docs: add s390x build documentation (ggml-org#14264)
* docs: add s390x-specific build docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add s390x model conversion steps Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x build indent Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update hyperlinks for s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update llama.h docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x add accelerator and perf optimizations Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x indent blocks Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: revert block indentation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add support information for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x reword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: remove indentation for accelerator section s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: remove redundant words s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: reword for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x reword simd Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: fix trailing whitespace for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
1 parent e9e0fe2 commit 963da5f

File tree

1 file changed

+13
-102
lines changed

1 file changed

+13
-102
lines changed

docs/build-s390x.md

Lines changed: 13 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ cd llama.cpp
1616

1717
## CPU Build with BLAS
1818

19-
Building llama.cpp with BLAS support is highly recommended as it has shown to provide performance improvements. Make sure to have OpenBLAS installed in your environment.
19+
Building llama.cpp with BLAS support is highly recommended as it has shown to provide performance improvements.
2020

2121
```bash
2222
cmake -S . -B build \
@@ -28,9 +28,8 @@ cmake --build build --config Release -j $(nproc)
2828
```
2929

3030
**Notes**:
31-
32-
- For faster repeated compilation, install [ccache](https://ccache.dev/)
33-
- By default, VXE/VXE2 is enabled. To disable it (not recommended):
31+
- For faster repeated compilation, install [ccache](https://ccache.dev/)
32+
- By default, VXE/VXE2 is enabled. To disable it (not recommended):
3433

3534
```bash
3635
cmake -S . -B build \
@@ -42,29 +41,18 @@ cmake --build build --config Release -j $(nproc)
4241
cmake --build build --config Release -j $(nproc)
4342
```
4443

45-
- By default, NNPA is enabled when available. To disable it (not recommended):
46-
47-
```bash
48-
cmake -S . -B build \
49-
-DCMAKE_BUILD_TYPE=Release \
50-
-DGGML_BLAS=ON \
51-
-DGGML_BLAS_VENDOR=OpenBLAS \
52-
-DGGML_NNPA=OFF
53-
54-
cmake --build build --config Release -j $(nproc)
55-
```
56-
57-
- For debug builds:
44+
- For debug builds:
5845

5946
```bash
6047
cmake -S . -B build \
6148
-DCMAKE_BUILD_TYPE=Debug \
6249
-DGGML_BLAS=ON \
6350
-DGGML_BLAS_VENDOR=OpenBLAS
51+
6452
cmake --build build --config Debug -j $(nproc)
6553
```
6654

67-
- For static builds, add `-DBUILD_SHARED_LIBS=OFF`:
55+
- For static builds, add `-DBUILD_SHARED_LIBS=OFF`:
6856

6957
```bash
7058
cmake -S . -B build \
@@ -82,18 +70,12 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
8270

8371
1. **Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)**
8472

85-
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
73+
You can find popular models pre-converted and verified at [s390x Ready Models](hf.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
8674

87-
You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
88-
89-
These models have already been converted from `safetensors` to `GGUF Big-Endian` and their respective tokenizers verified to run correctly on IBM z15 and later system.
75+
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
9076

9177
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
9278

93-
![File Type - safetensors](https://img.shields.io/badge/File_Type-safetensors-da1e28)
94-
95-
The model you are trying to convert must be in `safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
96-
9779
```bash
9880
python3 convert_hf_to_gguf.py \
9981
--outfile model-name-be.f16.gguf \
@@ -114,42 +96,32 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
11496

11597
3. **Convert existing GGUF Little-Endian model to Big-Endian**
11698

117-
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
118-
119-
The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
120-
12199
```bash
122100
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
123101
```
124102

125103
For example,
126-
127104
```bash
128105
python3 gguf-py/gguf/scripts/gguf_convert_endian.py granite-3.3-2b-instruct-le.f16.gguf BIG
129106
mv granite-3.3-2b-instruct-le.f16.gguf granite-3.3-2b-instruct-be.f16.gguf
130107
```
131108

132109
**Notes:**
133-
134110
- The GGUF endian conversion script may not support all data types at the moment and may fail for some models/quantizations. When that happens, please try manually converting the safetensors model to GGUF Big-Endian via Step 2.
135111

136112
## IBM Accelerators
137113

138114
### 1. SIMD Acceleration
139115

140-
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
116+
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14 or EC13. In such systems, the APIs can still run but will use a scalar implementation.
141117

142-
### 2. NNPA Vector Intrinsics Acceleration
118+
### 2. zDNN Accelerator
143119

144-
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
120+
*Only available in IBM z16 or later system. No direction at the moment.*
145121

146-
### 3. zDNN Accelerator
122+
### 3. Spyre Accelerator
147123

148-
_Only available in IBM z16 or later system. No direction at the moment._
149-
150-
### 4. Spyre Accelerator
151-
152-
_No direction at the moment._
124+
*No direction at the moment.*
153125

154126
## Performance Tuning
155127

@@ -173,22 +145,6 @@ It is strongly recommended to disable SMT via the kernel boot parameters as it n
173145
174146
IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongly recommended to use BLAS.
175147
176-
## Frequently Asked Questions (FAQ)
177-
178-
1. I'm getting the following error message while trying to load a model: `gguf_init_from_file_impl: failed to load model: this GGUF file version 50331648 is extremely large, is there a mismatch between the host and model endianness?`
179-
180-
Answer: Please ensure that the model you have downloaded/converted is GGUFv3 Big-Endian. These models are usually denoted with the `-be` suffix, i.e., `granite-3.3-2b-instruct-be.F16.gguf`.
181-
182-
You may refer to the [Getting GGUF Models](#getting-gguf-models) section to manually convert a `safetensors` model to `GGUF` Big Endian.
183-
184-
2. I'm getting extremely poor performance when running inference on a model
185-
186-
Answer: Please refer to the [Appendix B: SIMD Support Matrix](#appendix-b-simd-support-matrix) to check if your model quantization is supported by SIMD acceleration.
187-
188-
3. I'm building on IBM z17 and getting the following error messages: `invalid switch -march=z17`
189-
190-
Answer: Please ensure that your GCC compiler is of minimum GCC 15.1.0 version, and have `binutils` updated to the latest version. If this does not fix the problem, kindly open an issue.
191-
192148
## Getting Help on IBM Z & LinuxONE
193149
194150
1. **Bugs, Feature Requests**
@@ -199,48 +155,3 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
199155
200156
Please reach out directly to [aionz@us.ibm.com](mailto:aionz@us.ibm.com).
201157
202-
## Appendix A: Hardware Support Matrix
203-
204-
| | Support | Minimum Compiler Version |
205-
| ------- | ------- | ------------------------ |
206-
| IBM z15 || |
207-
| IBM z16 || |
208-
| IBM z17 || GCC 15.1.0 |
209-
210-
- ✅ - supported and verified to run as intended
211-
- 🚫 - unsupported, we are unlikely able to provide support
212-
213-
## Appendix B: SIMD Support Matrix
214-
215-
| | VX/VXE/VXE2 | NNPA | zDNN | Spyre |
216-
| ---------- | ----------- | ---- | ---- | ----- |
217-
| FP32 |||||
218-
| FP16 |||||
219-
| BF16 | 🚫 | 🚫 |||
220-
| Q4_0 |||||
221-
| Q4_1 |||||
222-
| Q5_0 | 🚫 | 🚫 |||
223-
| Q5_1 | 🚫 | 🚫 |||
224-
| Q8_0 |||||
225-
| Q2_K | 🚫 | 🚫 |||
226-
| Q3_K |||||
227-
| Q4_K |||||
228-
| Q5_K |||||
229-
| Q6_K |||||
230-
| TQ1_0 | 🚫 | 🚫 |||
231-
| TQ2_0 | 🚫 | 🚫 |||
232-
| IQ2_XXS | 🚫 | 🚫 |||
233-
| IQ2_XS | 🚫 | 🚫 |||
234-
| IQ2_S | 🚫 | 🚫 |||
235-
| IQ3_XXS | 🚫 | 🚫 |||
236-
| IQ3_S | 🚫 | 🚫 |||
237-
| IQ1_S | 🚫 | 🚫 |||
238-
| IQ1_M | 🚫 | 🚫 |||
239-
| IQ4_NL |||||
240-
| IQ4_XS |||||
241-
| FP32->FP16 | 🚫 ||||
242-
| FP16->FP32 | 🚫 ||||
243-
244-
- ✅ - acceleration available
245-
- 🚫 - acceleration unavailable, will still run using scalar implementation
246-
- ❓ - acceleration unknown, please contribute if you can test it yourself

0 commit comments

Comments
 (0)