-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
@arduano did an overhaul of the SIMD traits. That was a big undertaking and has many nice improvements like operator overloading.
I am trying to port my stuff over to use the v2.0.0-dev3 (current master) branch and I noticed that there 3 problems:
-
it does not build with no_std (
cargo build --features "no_std"
) -
it does not build with no_std (
cargo build --features "sleef"
) -
it seems that only
scalar
is exposed. A simple test program withsse2
,sse41
oravx2
will fi lead tofailed to resolve: could not find 'avx2
'in 'simdeez'...
^^^^^ could not find 'avx2' in 'simdeez'`.
Here is a simple test program (src/main.rs
):
use simdeez::prelude::*;
use simdeez::avx2::*;
use simdeez::scalar::*;
use simdeez::sse2::*;
use simdeez::sse41::*;
// If you want your SIMD function to use use runtime feature detection to call
// the fastest available version, use the simd_runtime_generate macro:
fn main() {
simd_runtime_generate!(
fn distance(x1: &[f32], y1: &[f32], x2: &[f32], y2: &[f32]) -> Vec<f32> {
let mut result: Vec<f32> = Vec::with_capacity(x1.len());
result.set_len(x1.len()); // for efficiency
// Set each slice to the same length for iteration efficiency
let mut x1 = &x1[..x1.len()];
let mut y1 = &y1[..x1.len()];
let mut x2 = &x2[..x1.len()];
let mut y2 = &y2[..x1.len()];
let mut res = &mut result[..x1.len()];
// Operations have to be done in terms of the vector width
// so that it will work with any size vector.
// the width of a vector type is provided as a constant
// so the compiler is free to optimize it more.
// S::Simd::Vf32::WIDTH is a constant, 4 when using SSE, 8 when using AVX2, etc
while x1.len() >= S::Vf32::WIDTH {
//load data from your vec into an SIMD value
let xv1 = S::Vf32::load_from_slice(x1);
let yv1 = S::Vf32::load_from_slice(y1);
let xv2 = S::Vf32::load_from_slice(x2);
let yv2 = S::Vf32::load_from_slice(y2);
let mut xdiff = xv1 - xv2;
let mut ydiff = yv1 - yv2;
xdiff *= xdiff;
ydiff *= ydiff;
let distance = (xdiff + ydiff).sqrt();
// Store the SIMD value into the result vec
distance.copy_to_slice(&mut res);
// Move each slice to the next position
x1 = &x1[S::Vf32::WIDTH..];
y1 = &y1[S::Vf32::WIDTH..];
x2 = &x2[S::Vf32::WIDTH..];
y2 = &y2[S::Vf32::WIDTH..];
res = &mut res[S::Vf32::WIDTH..];
}
// (Optional) Compute the remaining elements. Not necessary if you are sure the length
// of your data is always a multiple of the maximum S::Simd::Vf32::WIDTH you compile for (4 for SSE, 8 for AVX2, etc).
// This can be asserted by putting `assert_eq!(x1.len(), 0);` here
for i in 0..x1.len() {
let mut xdiff = x1[i] - x2[i];
let mut ydiff = y1[i] - y2[i];
xdiff *= xdiff;
ydiff *= ydiff;
let distance = (xdiff + ydiff).sqrt();
res[i] = distance;
}
result
}
);
let x1 = vec![0.0, 1.30, 2.3, 4.0];
let y1 = vec![0.0, 1.30, 2.3, 4.0];
let x2 = vec![0.0, 1.30, 2.3, 4.0];
let y2 = vec![0.0, 1.30, 2.3, 4.0];
//distance_scalar
//distance<S:Simd>` the generic version of your function
let got = distance_scalar(x1.as_slice(), y1.as_slice(), x2.as_slice(), y2.as_slice());
//distance_runtime_select
//distance_sse2
//distance_sse41
//distance_avx
//distance_avx2
//distance_runtime_select` picks the fastest of the above at runtime
}
Please advice on how to assist with these problems
Metadata
Metadata
Assignees
Labels
No labels