Does Compute Capability for PTX implicitly impact performance? (conditional compilation aside) #372
Unanswered
polarathene
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
If there are implicit optimizations by
nvcc
when compiling PTX for a target CC, a small example that demonstrates that behaviour would be appreciated.Otherwise I'm seeking confirmation if CC is only indirectly affecting performance due to conditional compilation (like this FP16 example for CC 5.3, which includes a compatibility fallback for compiling with earlier CC), where ignoring that CC version only matters for building code using features/data types dependent upon a minimum CC (and thus an explicit build failure when CC is insufficient), but offers no additional implicit optimizations from building for a higher CC version?
Original message
Is the compute capability provided to
nvcc --gpu-architecture
similar tox86-64
micro-architecture levels likex86-64-v3
, in that the compiler may optimize for more performant operations when the compute capability is higher?Or is it only relevant to examples like CC 5.3 with FP16, where compilation would fail for a compute capability target below 5.3 (minimum for FP16 support), unless the source code itself had a conditional compilation (like the linked example shows).
FP16 example
I can understand when compiling third-party code/libraries that provide their own kernels to build at build-time, but I assume beyond that conditional compilation with macros, does the compute capability given have any other implicit impact to performance when built? (via
nvcc
or at runtime via JIT if PTX was embedded)This information was a little difficult to find confirmation on. Fallback macros for handling CC compatibility aside, am I right to assume that compute capability is providing newer API methods and data types (as documented in the CUDA wikipedia article compute capability section), where the minimum CC is where compilation would fail due to using those newer features? No actual implicit optimizations at higher CC versions beyond that?
In practice I get that larger CUDA projects or through higher-level abstractions for convenience, conditional compilation will be more prevalent 👍 (I'm just curious how the CC version affects compilation beyond that)
Beta Was this translation helpful? Give feedback.
All reactions