Skip to content

Commit 99937ef

Browse files
update readme-sycl.md, rm debug code
1 parent 8215a77 commit 99937ef

File tree

5 files changed

+124
-59
lines changed

5 files changed

+124
-59
lines changed

README-sycl.md

Lines changed: 63 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -296,15 +296,25 @@ Similar to the native `sycl-ls`, available SYCL devices can be queried as follow
296296
A example of such log in a system with 1 *intel CPU* and 1 *intel GPU* can look like the following:
297297
```
298298
found 6 SYCL devices:
299-
| | | |Compute |Max compute|Max work|Max sub| |
300-
|ID| Device Type| Name|capability|units |group |group |Global mem size|
301-
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
302-
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
303-
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
304-
| 2| [opencl:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 3.0| 512| 1024| 32| 16225243136|
305-
| 3| [opencl:gpu:1]| Intel(R) UHD Graphics 770| 3.0| 32| 512| 32| 53651849216|
306-
| 4| [opencl:cpu:0]| 13th Gen Intel(R) Core(TM) i7-13700K| 3.0| 24| 8192| 64| 67064815616|
307-
| 5| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 24|67108864| 64| 67064815616|
299+
Part1:
300+
|ID| Ver| Device Type| Name|Global mem size|
301+
|--|----|-------------------|---------------------------------------|---------------|
302+
| 0| 1.3| [level_zero:gpu:0]| Intel Data Center GPU Flex 170| 16225M|
303+
| 1| 1.3| [level_zero:gpu:1]| Intel Data Center GPU Flex 170| 16225M|
304+
| 2| 3.0| [opencl:gpu:0]| Intel Data Center GPU Flex 170| 16225M|
305+
| 3| 3.0| [opencl:gpu:1]| Intel Data Center GPU Flex 170| 16225M|
306+
| 4| 3.0| [opencl:cpu:0]| Intel Xeon Gold 6346 CPU @ 3.10GHz| 540700M|
307+
| 5| 1.2| [opencl:acc:0]| Intel FPGA Emulation Device| 540700M|
308+
Part2:
309+
|ID|Max compute units|Max work group|Max subgroup| Driver version|
310+
|--|-----------------|--------------|------------|----------------------------------|
311+
| 0| 512| 1024| 32| 1.3.27642|
312+
| 1| 512| 1024| 32| 1.3.27642|
313+
| 2| 512| 1024| 32| 23.43.27642.40|
314+
| 3| 512| 1024| 32| 23.43.27642.40|
315+
| 4| 64| 8192| 64|2024.17.5.0.08_160000.xmain-hotfix|
316+
| 5| 64| 67108864| 64|2024.17.5.0.08_160000.xmain-hotfix|
317+
308318
```
309319

310320
| Attribute | Note |
@@ -469,15 +479,24 @@ build\bin\ls-sycl-device.exe
469479
The output of this command in a system with 1 *intel CPU* and 1 *intel GPU* would look like the following:
470480
```
471481
found 6 SYCL devices:
472-
| | | |Compute |Max compute|Max work|Max sub| |
473-
|ID| Device Type| Name|capability|units |group |group |Global mem size|
474-
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
475-
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
476-
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
477-
| 2| [opencl:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 3.0| 512| 1024| 32| 16225243136|
478-
| 3| [opencl:gpu:1]| Intel(R) UHD Graphics 770| 3.0| 32| 512| 32| 53651849216|
479-
| 4| [opencl:cpu:0]| 13th Gen Intel(R) Core(TM) i7-13700K| 3.0| 24| 8192| 64| 67064815616|
480-
| 5| [opencl:acc:0]| Intel(R) FPGA Emulation Device| 1.2| 24|67108864| 64| 67064815616|
482+
Part1:
483+
|ID| Ver| Device Type| Name|Global mem size|
484+
|--|----|-------------------|---------------------------------------|---------------|
485+
| 0| 1.3| [level_zero:gpu:0]| Intel Data Center GPU Flex 170| 16225M|
486+
| 1| 1.3| [level_zero:gpu:1]| Intel Data Center GPU Flex 170| 16225M|
487+
| 2| 3.0| [opencl:gpu:0]| Intel Data Center GPU Flex 170| 16225M|
488+
| 3| 3.0| [opencl:gpu:1]| Intel Data Center GPU Flex 170| 16225M|
489+
| 4| 3.0| [opencl:cpu:0]| Intel Xeon Gold 6346 CPU @ 3.10GHz| 540700M|
490+
| 5| 1.2| [opencl:acc:0]| Intel FPGA Emulation Device| 540700M|
491+
Part2:
492+
|ID|Max compute units|Max work group|Max subgroup| Driver version|
493+
|--|-----------------|--------------|------------|----------------------------------|
494+
| 0| 512| 1024| 32| 1.3.27642|
495+
| 1| 512| 1024| 32| 1.3.27642|
496+
| 2| 512| 1024| 32| 23.43.27642.40|
497+
| 3| 512| 1024| 32| 23.43.27642.40|
498+
| 4| 64| 8192| 64|2024.17.5.0.08_160000.xmain-hotfix|
499+
| 5| 64| 67108864| 64|2024.17.5.0.08_160000.xmain-hotfix|
481500
482501
```
483502

@@ -548,6 +567,32 @@ use 1 SYCL GPUs: [0] with Max compute units:512
548567
|-------------------|------------------|---------------------------------------------------------------------------------------------------------------------------|
549568
| GGML_SYCL_DEBUG | 0 (default) or 1 | Enable log function by macro: GGML_SYCL_DEBUG |
550569
| ZES_ENABLE_SYSMAN | 0 (default) or 1 | Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer |
570+
| GGML_SYCL_VISIBLE_DEVICES|id1,id2,...|It's like `CUDA_VISIBLE_DEVICES`, define the SYCL device ID list to visible. Like "0", "0,2", "2,1" |
571+
| ONEAPI_DEVICE_SELECTOR|Refer to [oneapi-device-selector](https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector)|be used to limit the choice of devices available when the SYCL-using application is run|
572+
573+
##### Choose SYCL Devices in Running Time
574+
575+
In SYCL running time, a physical device could be mapped to two logical devices on different running times: Level-Zero and OpenCL. So it will show more devices in SYCL view. But we need avoid to run code on these two logical devices on same physical device in same time.
576+
577+
The SCYL backend supports dGPU or iGPU in same machine.
578+
579+
##### SYCL Backend Rule:
580+
581+
|Mode|Explain|Example|Recommend Cases|Note|
582+
|-|-|-|-|-|
583+
|Normal|Use all powest devices. Default mode. No special setting.<br>SYCL backend will detect and choose the **Level-Zero** devices which have top `Max compute units`.<br> ||Most cases of normal user.||
584+
|Advanced|Allow user choose one or more SYCL devices which could be Level-Zero or OpenCL or both.<br>Set the device list by environment variable: **GGML_SYCL_VISIBLE_DEVICES**, like `CUDA_VISIBLE_DEVICES`.<br>SYCL backend will choose all devices by it.| `set/export GGML_SYCL_VISIBLE_DEVICES=1`<br>`set/export GGML_SYCL_VISIBLE_DEVICES=0,1`<br>`set/export GGML_SYCL_VISIBLE_DEVICES=2,1`|Use iGPU or both in dGPU + iGPU environment<br>Use a dGPU in mulitple dGPU environment.<br>Use one or more OpenCL devices|There is known issue of OpenCL device. WIP.|
585+
|Developer|Allow SYCL developer choose one or more SYCL devices by environment varibale **ONEAPI_DEVICE_SELECTOR** with flexiable grammar.<br>Refer to [oneapi-device-selector](https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector).|`set/export ONEAPI_DEVICE_SELECTOR=level_zero:1`<br>`set/export ONEAPI_DEVICE_SELECTOR=opencl:*`<br>`set/export ONEAPI_DEVICE_SELECTOR=opencl:gpu;level_zero:gpu`<br>|Cover the Advanced mode. It will impact **Normal** and **Advanced** modes as low level principle.<br>Flexiable grammar support more complex device environments.|There is known issue of OpenCL device. WIP.|
586+
587+
##### Parameters of Llama.cpp
588+
589+
The parameters about device choose of llama.cpp works with SYCL backend rule to decide the final result. User could use one or all chosen devices by SYCL backend rule.
590+
591+
|Device|Values|Note|
592+
|-|-|-|
593+
|Single Device|`--split-mode=none` and `--main-gpu=id`|The value of `main-gpu` must be in the chosen device lists printed out during llama.cpp startup. Like:<br>`detect 2 SYCL level-zero GPUs:[0,1]`.<br>`main-gpu` should be set to `0` or `1`.|
594+
|Multiple Device|`--split-mode=layer`|Default|
595+
551596

552597
## Known Issues
553598

ggml-sycl.cpp

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2555,11 +2555,11 @@ static inline int get_work_group_size(const sycl::device& device) {
25552555

25562556
inline void check_allow_device_id(const int device_id) {
25572557
if (ggml_sycl_info().device_count<1) {
2558-
fprintf(stderr, "%s: not detect any SYCL devices, please check GPU driver or unset ONEAPI_DEVICE_SELECTOR!\n", __func__);
2558+
fprintf(stderr, "%s: not detect any SYCL devices, check GPU driver or unset GGML_SYCL_VISIBLE_DEVICES and ONEAPI_DEVICE_SELECTOR\n", __func__);
25592559
exit(1);
25602560
}
25612561
if (!ggml_sycl_info().is_allowed_device(device_id)) {
2562-
fprintf(stderr, "%s: device_id:%d is out of range [%s]. To use any SYCL devices, please set/export ONEAPI_DEVICE_SELECTOR\n",
2562+
fprintf(stderr, "%s: device_id:%d is out of range [%s]. To use any SYCL devices, set/export GGML_SYCL_VISIBLE_DEVICES or ONEAPI_DEVICE_SELECTOR\n",
25632563
__func__, device_id, ggml_sycl_info().devices_list());
25642564
exit_with_stack_print();
25652565
}
@@ -5893,7 +5893,6 @@ GGML_CALL static void ggml_backend_sycl_set_tensor_async(ggml_backend_t backend,
58935893

58945894
GGML_ASSERT(buf->buft == ggml_backend_sycl_buffer_type(sycl_ctx->device) && "unsupported buffer type");
58955895
const queue_ptr stream = sycl_ctx->stream(sycl_ctx->device, 0);
5896-
printf("zjy ggml_backend_sycl_set_tensor_async sycl_ctx->device=%d stream=%p\n", sycl_ctx->device, stream);
58975896

58985897
SYCL_CHECK(CHECK_TRY_ERROR((stream)->memcpy(
58995898
(char *)tensor->data + offset, data, size).wait()));

ggml-sycl.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,6 @@ GGML_API GGML_CALL int ggml_backend_sycl_get_device_index(int device_id);
3636
GGML_API GGML_CALL int ggml_backend_sycl_get_device_id(int index);
3737
GGML_API GGML_CALL void ggml_sycl_set_single_device(int main_gpu_id);
3838

39-
// GGML_API GGML_CALL void ggml_backend_sycl_set_single_device_mode(int main_gpu_id);
40-
// GGML_API GGML_CALL void ggml_backend_sycl_set_mul_device_mode();
41-
4239
// SYCL doesn't support registering host memory, keep here for reference
4340
// GGML_API GGML_CALL bool ggml_backend_sycl_register_host_buffer(void * buffer, size_t size);
4441
// GGML_API GGML_CALL void ggml_backend_sycl_unregister_host_buffer(void * buffer);

ggml-sycl/common.cpp

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ static std::vector<int> get_sycl_visible_devices() {
8787
return device_ids;
8888
}
8989

90-
void print_device_detail(int id, sycl::device &device, std::string device_type) {
90+
void print_device_detail_part1(int id, sycl::device &device, std::string device_type) {
9191

9292
dpct::device_info prop;
9393
SYCL_CHECK(CHECK_TRY_ERROR(
@@ -105,29 +105,52 @@ void print_device_detail(int id, sycl::device &device, std::string device_type)
105105

106106
auto global_mem_size = prop.get_global_mem_size()/1000000;
107107

108-
fprintf(stderr, "|%2d|%19s|%39s|%7s|%7d|%8d|%5d|%6luM|%21s|\n", id, device_type.c_str(),
109-
name.c_str(), version.c_str(), prop.get_max_compute_units(),
110-
prop.get_max_work_group_size(), prop.get_max_sub_group_size(),
111-
global_mem_size, device.get_info<sycl::info::device::driver_version>().c_str());
108+
fprintf(stderr, "|%2d|%4s|%19s|%39s|%14luM|\n", id, version.c_str(), device_type.c_str(),
109+
name.c_str(), global_mem_size);
110+
}
111+
112+
void print_device_detail_part2(int id, sycl::device &device, std::string device_type) {
113+
114+
dpct::device_info prop;
115+
SYCL_CHECK(CHECK_TRY_ERROR(
116+
dpct::get_device_info(prop, device)));
117+
118+
fprintf(stderr, "|%2d|%17d|%14d|%12d|%34s|\n", id,
119+
prop.get_max_compute_units(),
120+
prop.get_max_work_group_size(), prop.get_max_sub_group_size(),
121+
device.get_info<sycl::info::device::driver_version>().c_str());
112122
}
113123

114124
void ggml_backend_sycl_print_sycl_devices() {
115125
GGML_SYCL_DEBUG("[SYCL] call ggml_backend_sycl_print_sycl_devices\n");
116126
int device_count = dpct::dev_mgr::instance().device_count();
117127
std::map<std::string, size_t> DeviceNums;
118128
fprintf(stderr, "found %d SYCL devices:\n", device_count);
119-
fprintf(stderr, "| | | | |Max | |Max |Global | |\n");
120-
fprintf(stderr, "| | | | |compute|Max work|sub |mem | |\n");
121-
fprintf(stderr, "|ID| Device Type| Name|Version|units |group |group|size | Driver version|\n");
122-
fprintf(stderr, "|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|\n");
129+
fprintf(stderr, "Part1:\n");
130+
fprintf(stderr, "|ID| Ver| Device Type| Name|Global mem size|\n");
131+
fprintf(stderr, "|--|----|-------------------|---------------------------------------|---------------|\n");
123132
for (int id = 0; id < device_count; ++id) {
124133
sycl::device device = dpct::dev_mgr::instance().get_device(id);
125134
sycl::backend backend = device.get_backend();
126135
std::string backend_type = get_device_backend_and_type(device);
127136
int type_id=DeviceNums[backend_type]++;
128137
std::stringstream device_type;
129138
device_type << "[" << backend_type << ":" << std::to_string(type_id) << "]";
130-
print_device_detail(id, device, device_type.str());
139+
print_device_detail_part1(id, device, device_type.str());
140+
}
141+
142+
std::map<std::string, size_t> DeviceNums2;
143+
fprintf(stderr, "Part2:\n");
144+
fprintf(stderr, "|ID|Max compute units|Max work group|Max subgroup| Driver version|\n");
145+
fprintf(stderr, "|--|-----------------|--------------|------------|----------------------------------|\n");
146+
for (int id = 0; id < device_count; ++id) {
147+
sycl::device device = dpct::dev_mgr::instance().get_device(id);
148+
sycl::backend backend = device.get_backend();
149+
std::string backend_type = get_device_backend_and_type(device);
150+
int type_id=DeviceNums2[backend_type]++;
151+
std::stringstream device_type;
152+
device_type << "[" << backend_type << ":" << std::to_string(type_id) << "]";
153+
print_device_detail_part2(id, device, device_type.str());
131154
}
132155
}
133156

@@ -174,7 +197,7 @@ static ggml_sycl_device_info ggml_sycl_init() try {
174197
info.refresh_device();
175198

176199
if (info.device_count == 0) {
177-
fprintf(stderr, "%s: failed to initialize " GGML_SYCL_NAME ": %s\n",
200+
fprintf(stderr, "%s: failed to initialize " GGML_SYCL_NAME ": no available device found\n",
178201
__func__);
179202
return info;
180203
}

0 commit comments

Comments
 (0)