[FEA] CVT F32 -> TF32 PTX for sm80 #2254

osayamenja · 2025-04-19T17:48:13Z

Is your feature request related to a problem? Please describe.
Currently, converting from tf32 to f32 with round to nearest dispatches to a PTX cvt instruction only for sm90.

Describe the solution you'd like
If we allow rna rounding, we can dispatch to cvt.rna.tf32.f32, which works for sm80.

Describe alternatives you've considered
N/A

Additional context
A simple code sample is given below:

__global__ void f2tfK() {
    constexpr float x = -0.45466f;
    uint32_t d = 0;
    constexpr auto f2tf = cutlass::NumericConverter<cutlass::tfloat32_t, float>{};
    asm volatile("cvt.rna.tf32.f32 %0, %1;" : "=r"(d) : "f"(x));
    const auto res = cutlass::tfloat32_t::bitcast(d);
    const auto cRes = f2tf(x);
    printf("Intrinsic: "); cute::print(res); printf("\n");
    printf("Other: "); cute::print(cRes); printf("\n");
    printf("isEqual? %s\n", cRes == res ? "yes" : "no");
}
// Output: 
// Intrinsic: -0.454590
// Other: -0.454590
// isEqual? yes

The text was updated successfully, but these errors were encountered:

osayamenja added ? - Needs Triage feature request New feature or request labels Apr 19, 2025

osayamenja changed the title ~~[FEA] CVT TF32 -> F32 PTX for sm80~~ [FEA] CVT F32 -> TF32 PTX for sm80 Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] CVT F32 -> TF32 PTX for sm80 #2254

[FEA] CVT F32 -> TF32 PTX for sm80 #2254

osayamenja commented Apr 19, 2025 •

edited

Loading

[FEA] CVT F32 -> TF32 PTX for sm80 #2254

[FEA] CVT F32 -> TF32 PTX for sm80 #2254

Comments

osayamenja commented Apr 19, 2025 • edited Loading

osayamenja commented Apr 19, 2025 •

edited

Loading