make LLAMA_CUBLAS=1 - nvcc fatal : Value 'native' is not defined for option 'gpu-architecture' #2142

FileDotZip · 2023-07-08T00:12:43Z

FileDotZip
Jul 8, 2023

I was wondering if anyone else has come across this issue before when trying to compile llama with cublas.
Im getting an error " Value 'native' is not defined for option 'gpu-architecture' "

Im trying to compile llama on Ubuntu 22.04, and I have installed 5x Nvidia P40 (24gb) and 2x Nvidia P100's (one with 12gb, and one 16gb)

I tried to follow what was being said here, but it appears theyre trying to figure it out for windows and never came to a definite conclusion.
#1070

Here is my log.

root@devops:/ai/llm/llama.cpp.gpu# make clean
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
I LDFLAGS:
I CC: cc (Ubuntu 11.3.0-1ubuntu122.04.1) 11.3.0
I CXX: g++ (Ubuntu 11.3.0-1ubuntu122.04.1) 11.3.0

rm -vf *.o *.so main quantize quantize-stats perplexity embedding benchmark-matmult save-load-state server simple vdot train-text-from-scratch embd-input-test build-info.h
removed 'common.o'
removed 'ggml.o'
removed 'k_quants.o'
removed 'llama.o'
removed 'build-info.h'
root@devops:/ai/llm/llama.cpp.gpu# make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Ubuntu 11.3.0-1ubuntu122.04.1) 11.3.0
I CXX: g++ (Ubuntu 11.3.0-1ubuntu122.04.1) 11.3.0

cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c examples/common.cpp -o common.o
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c -o k_quants.o k_quants.c
nvcc --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal : Value 'native' is not defined for option 'gpu-architecture'
make: *** [Makefile:191: ggml-cuda.o] Error 1

BarfingLemurs · 2023-07-08T02:45:13Z

BarfingLemurs
Jul 8, 2023

I have had the same error, so I followed cmake instructions instead and it build fine.

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON
cmake --build . --config Release

5 replies

FileDotZip Jul 8, 2023
Author

Where did you go after that? it appears to have worked for me too.

FileDotZip Jul 8, 2023
Author

okay, after that, i got a /bin/ folder with executable programs within build

Then I ran ./main by running

./main -m /ai/llm/llama.cpp.cpu/models/30B/ggml-model-q4_0.bin -n 128 --n-gpu-layers 60

ghost Jul 8, 2023

CMake creates ./main inside llama.cpp/build/bin

FileDotZip Jul 8, 2023
Author

I cant seem to find the chat executable. Any ideas?

ghost Jul 8, 2023

I cant seem to find the chat executable. Any ideas?

There is no chat executable... without more information then I guess the issue is that .main runs automatically, yes? There's many ways to interact with the model, perhaps add -i -ins parameters to ./main. Here's more information:
https://github.com/ggerganov/llama.cpp/blob/061f5f8d2109bb7adcbd40f1b456d887c5a1df25/examples/main/README.md?plain=1#L84

edit: ./server has a simple chat-like interface. Try running the model through ./server -m ~/path/to/model.bin

then Open a web-browser and nagivate to 127.0.0.1:8080

theflyingswami · 2023-08-13T16:10:26Z

theflyingswami
Aug 13, 2023

I found a solution to this error so I wanted to share, I'm not sure this is the right place.

In the Makefile, change the line:
NVCCFLAGS += -arch=native

to read:
NVCCFLAGS += -arch=all-major

There might be some warnings about depreciation but it compiled for me.

For reference, I found the solution here: ggml-org/whisper.cpp#876

++++++++++++

EDIT: After a little more tinkering, I've realized the 'all-major' is a bit hacky and can be narrowed down for the specific Nvidia card.

First, find the card in question on this page:
https://developer.nvidia.com/cuda-gpus

Once you've found the card, use the corresponding 'compute capability' with 'sm_' for your makefile (ignore the decimal).

For example, I have a GeForce 2070 max q. On the reference page, the GeForce 2070 compute capability is 7.5 so, in my makefile, I changed the line to:
NVCCFLAGS += -arch=sm_75

and it compiles! 'compute_75' may also work, but I didn't try it (you can get a list of available options by using nvcc --list-gpu-arch).

Of course, 'all-major' should work just fine, I'm just going down the Makefile rabbit hole and thought I'd share in case it helps someone, somewhere, at some point.

0 replies

jkbh · 2023-12-13T16:30:54Z

jkbh
Dec 13, 2023

The -arch=native option was introduced with this nvcc version: https://docs.nvidia.com/cuda/archive/11.6.0/cuda-toolkit-release-notes/index.html#cuda-compiler-new-features

If someone apt installs the cuda-toolkit for Ubuntu 22.04 LTS, he gets a version that is just a little bit to old from the default repository.

So one has to either get a newer cuda-toolkit/nvcc or change the Makefile as proposed before.

Maybe there should be a hint somewhere in the installation guide, because this happens for the LTS version of Ubuntu, which is probably used quite often.

0 replies

make LLAMA_CUBLAS=1 - nvcc fatal : Value 'native' is not defined for option 'gpu-architecture' #2142

Uh oh!

Uh oh!

FileDotZip Jul 8, 2023

Replies: 3 comments · 5 replies

Uh oh!

BarfingLemurs Jul 8, 2023

Uh oh!

FileDotZip Jul 8, 2023 Author

Uh oh!

FileDotZip Jul 8, 2023 Author

Uh oh!

ghost Jul 8, 2023

Uh oh!

FileDotZip Jul 8, 2023 Author

Uh oh!

Uh oh!

ghost Jul 8, 2023

Uh oh!

Uh oh!

theflyingswami Aug 13, 2023

Uh oh!

jkbh Dec 13, 2023

FileDotZip
Jul 8, 2023

Replies: 3 comments 5 replies

BarfingLemurs
Jul 8, 2023

FileDotZip Jul 8, 2023
Author

FileDotZip Jul 8, 2023
Author

FileDotZip Jul 8, 2023
Author

theflyingswami
Aug 13, 2023

jkbh
Dec 13, 2023