Skip to content

Can ggllm.cpp run Falcon on Apple Silicon? #63

Answered by cmp-nct
RDearnaley asked this question in Q&A
Discussion options

You must be logged in to vote

Falcon is quite VRAM friendly compared to llama, it uses MQA which requires a fraction of memory and it appears to be less insulted on heavy quantization than typical llama.
You can run a quite high quality Falcon 40B on less than 24GB RAM. Of course 64GB is amazing, 36GB are required to run it in very high quality which is beyond most single GPUs.

Regarding running on MAC: I've been told it works. I don't have one here for testing but others report it runs fast on 'Metal'.
You should be able to build it similar as llama.cpp, you might need to disable cublas manually in addition to activating metal.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by cmp-nct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants