-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Summary
This proposal adds support for explicitly expressing the priority of function versions, in the target strings.
Example use case
For the case below, the default ordering has the sve
feature as higher priority then dotprod
, meaning for a target with both sve
and dotprod
the sve
version would be selected. However, it may be that the dotprod version should be priority in this case.
typedef struct {
int x;
int y;
int z;
int w;
} Vec4;
[[target_clone("default", "sve")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
// do a dot product
}
[[target_version("dotprod")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
// Use dotprod intrinsics
}
Motivation
This enhancement was a natural result of considering the version priority rules. In specifying the default ordering, cases such as above, arose where the default priority rules would be a poor choice.
Possible solutions
There are several ways to solve this.
1. Add dummy features "priorityA", "priorityB", ...,
These features would have no effect on the versioned function, other than to change how they are ordered.
These versions would be higher priority than any other feature, so would override any default ordering.
Then the above would be:
typedef struct {
int x;
int y;
int z;
int w;
} Vec4;
[[target_clone("default", "sve")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
// do a dot product
}
[[target_version("dotprod+priorityA")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
// Use dotprod intrinsics
}
Could also use "priority1", "priority2", ...
This is similar to what was done for other targets (https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files).
2. Label all the versions of a function
Another option is too support explicitly stating the order of all versions.
Something like:
typedef struct {
int x;
int y;
int z;
int w;
} Vec4;
[[target_clone("P3:default", "P2:sve")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
// do a dot product
}
[[target_version("P1:dotprod")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
// Use dotprod intrinsics
}
This seems like it introduces many more edge cases and complexity over suggestion 1 with little gain other than explicitness.