Slow performance with LTO #634

Netzwerk2 · 2025-05-03T12:26:21Z

Netzwerk2
May 3, 2025

I'm working on a path tracer and wanted to switch to glam, because it uses SIMD.
When I replaced my own scalar math types (f64) with those from glam (Mat4, Vec3A) I found that strangely glam is slower.
I've made a small benchmark for the ray-sphere intersection routine I use, which can be found here. After a bit of experimenting I found that using lto = "fat" causes my math implementation to be significantly faster than glam.

	main	glam
lto = "fat"	46.459 ns	53.929 ns
lto = "thin"	109.13 ns	58.765 ns
no lto	161.61 ns	58.785 ns

Does anybody know why this is the case/how to fix this?

Netzwerk2 · 2025-05-03T22:30:11Z

Netzwerk2
May 3, 2025
Author

I managed to narrow it down to one line which causes glam to be slower.

let uv = sphere.uv_map(intersection_point);

If I replace the it with

let uv = Point2::ZERO; # `Vec2::ZERO` for glam

glam only takes 15.194 ns while my math implementation takes 24.109 ns

This is the function

fn uv_map(point: glam::Vec3A) -> glam::Vec2 {
    let u = 0.5 + (-point.z).atan2(point.x) * 0.5 * FRAC_1_PI;
    let v = 0.5 - point.y.asin() * FRAC_1_PI;

    glam::Vec2::new(u, v)
}

3 replies

bitshifter May 3, 2025
Maintainer

That is a bit surprising :) Sometimes going from SIMD to non SIMD which is happening here accessing the x y and z components can trip the compiler up. I'd #[inline] it for starters and see if that helps.

I assuming you are performing the same calculation in your own code somewhere? It's possible that the fat LTO is helping with the atan2 and asin calls as well, which is a bit separate to glam since those are Rust math calls.

Looking at the generated asm for both intersection methods is probably the best way to understand what is happening.

Netzwerk2 May 18, 2025
Author

Thanks for the answer! I tried using #[inline] but iirc it didn't really make a difference.
Unfortunately, I don't really know how to read ASM. At the moment I also don't have the time to dig further into this.
Thanks for the help anyway, though :)

bitshifter May 18, 2025
Maintainer

Ah that's a shame.

The main thing to look out for in the asm is call, which means there's a function call and things aren't getting inlined, for small functions not getting inlined can be a lot of additional overhead. This https://github.com/pacak/cargo-show-asm makes it pretty easy to view the asm. It can give some idea of what's going on, even if most of it doesn't make sense. You can add do --rust so it will interleave the rust code with the asm which can help explain what you're looking at.

It could of course be something else.

bitshifter · 2025-05-03T22:39:38Z

bitshifter
May 3, 2025
Maintainer

This is usually because some "hot" function is not getting inlined. LTO "thin" is not able to inline functions (or it's limited) whereas LTO "fat" will. glam works around this by adding the #[inline] attribute to the majority of glam methods.

Looking at your code, it's hard to say for sure without looking at disassembly or profiling but I'd suggest trying the following:

add #[inline] to transform_normal, transform_tangent and uv_map.

Also your transform_tangent is converting from Vec4 which is SIMD to Vec3 which is not. Instead you could do something like

fn transform_tangent(mat4: glam::Mat4, tangent: glam::Vec4) -> glam::Vec4 {
    let tangent_xyz = mat4.transform_vector3a(Vec3A::from_vec4(tangent)).normalize();
    tangent_xyz.extend(tangent.w)
}

If you want to look at asm, I recommend trying https://crates.io/crates/cargo-asm. If things aren't getting inlined it's usually fairly obvious in the asm.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow performance with LTO #634

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Slow performance with LTO #634

Uh oh!

Uh oh!

Netzwerk2 May 3, 2025

Replies: 2 comments · 3 replies

Uh oh!

Netzwerk2 May 3, 2025 Author

Uh oh!

Uh oh!

bitshifter May 3, 2025 Maintainer

Uh oh!

Netzwerk2 May 18, 2025 Author

Uh oh!

bitshifter May 18, 2025 Maintainer

Uh oh!

bitshifter May 3, 2025 Maintainer

Netzwerk2
May 3, 2025

Replies: 2 comments 3 replies

Netzwerk2
May 3, 2025
Author

bitshifter May 3, 2025
Maintainer

Netzwerk2 May 18, 2025
Author

bitshifter May 18, 2025
Maintainer

bitshifter
May 3, 2025
Maintainer