Skip to content

[OpenCL] cl_khr_integer_dot_product is much slower than dp4a emulation #18212

@ProjectPhysX

Description

@ProjectPhysX

Hi all,

the new cl_khr_integer_dot_product addition in 2025.1 release is broken.

  • On Windows (2025.19.3.0.17_230222), the __opencl_c_integer_dot_product_input_4x8bit and __opencl_c_integer_dot_product_input_4x8bit_packed feature macros are now present, but the dot(char4, char4)/dot_acc_sat(char4, char4, int) functions fail to compile with errors instructions in function CompilerException Failed to lookup symbol add_kernel JIT session error: Symbols not found: [ _Z3dotDv4_cS_ ] / [ _Z11dot_acc_satDv4_cS_i ].
  • On Linux (2025.19.3.0.17_230222), both
    int dp4a(const char4 a, const char4 b, const int c) {
    	return c+dot(a, b); // 0.020 TIOPs/s
    }
    and
    int dp4a(const char4 a, const char4 b, const int c) {
    	return dot_acc_sat(a, b, c); // 0.015 TIOPs/s
    }
    perform much slower than the emulation variant
    int dp4a(const char4 a, const char4 b, const int c) {
    	return c+a.x*b.x+a.y*b.y+a.z*b.z+a.w*b.w; // 0.064 TIOPs/s
    }
    as measured on my i7-8700K CPU with my OpenCL-Benchmark. The full dp4a function implementation is here. The performance behavior is the same on AMD Ryzen 9 7950X.

Kind regards,
Moritz

Metadata

Metadata

Assignees

No one assigned

    Labels

    OCL CPU Experimental RTIssues in Experimental Intel(R) CPU Runtime for OpenCL(TM) Applications with SYCL supportbugSomething isn't workingconfirmed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions