-
Notifications
You must be signed in to change notification settings - Fork 18
Description
I was able to build and run the application offline on my own hardware. However, I was met with:
Torch reports CUDA not available
and I cannot train anything with FP16 precision, instead having to train using the cpu exclusively.
About my system:
I installed the nvidia toolkit using the instructions and configured it, restarting docker daemon:
[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html]
Output for driver and CUDA from nvidia-smi:
** NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4**
System information:
Linux Mint 20 x86_64
5.15.0-83-generic
GPU information:
RTX 3070
registered as dev/nvidia0
Any ideas what might be causing this?
Edit:
After the cpu training was finished, I got another error that may shed light on what was causing this:
kohya-docker-kohya-1 | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
kohya-docker-kohya-1 | CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA exception! Error code: initialization error
kohya-docker-kohya-1 | CUDA SETUP: Highest compute capability among GPUs detected: None
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA SETUP: CUDA version lower than 11 are currenlty not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
kohya-docker-kohya-1 | CUDA SETUP: Detected CUDA version 00
kohya-docker-kohya-1 | CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda00_nocublaslt.so
kohya-docker-kohya-1 | CUDA SETUP: Defaulting to libbitsandbytes.so...
kohya-docker-kohya-1 | CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
kohya-docker-kohya-1 | CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION
for example, make CUDA_VERSION=113
.
My current CUDA version:
Cuda compilation tools, release 10.1, V10.1.243
I'm guessing my CUDA version is too old and not configured properly since it isn't found in path. I'll try reinstalling a version 11 or higher and retrying the docker image.