This is a bug fix PR. It adds some missing execution space annotations (with the XSF_HOST_DEVICE
macro) for functions which are not currently used in CuPy, but which are needed when compiling functions which are used in CuPy when using CUDA toolkit 11.