-
Notifications
You must be signed in to change notification settings - Fork 15
Description
An enhancement suggestion.
It would be convenient to compile ELPA with Nvidia GPU support for multiple architecture, such that the library will contain compiled cubin for multiple architectures. This allows better portability, especially in heterogeneous environments, and facilitates cross-compilation. It is a common option in other GPU-enabled libraries which would be nice to have.
I attach an example of a patch for configure.ac. Would you consider to take this feature upstream?
The patch reflects the pattern we use, although a common one, which is to generate PTX for the highest architecture (ensures forward compatibility) and cubin for multiple architectures (best performances and forward driver compatibility within the same major version). Currently we only use ELPA1 on GPU, therefore it is not tested (yet) with the options related to the ELPA2 kernel.