Can ggllm.cpp run Falcon on Apple Silicon? #63

RDearnaley · 2023-07-14T02:45:21Z

RDearnaley
Jul 14, 2023

As I understand it, ggllm.cpp is a fork of llama.cpp intended to run Falcon models, and llama.cpp can run Llama-derived models on Apple Silicon (M1/M2 Macs), and can even run them using the Apple neural engine cores rather than the CPU cores. Can ggllm.cpp run Falcon models on Apple Silicon?

Obviously Falcon 40B needs a lot of VRAM to run on GPU, so the Apple approach of sharing the system RAM rather than specialized VRAM is appealing: a fair number of modern Macs have 64gb or more of RAM. If this currently doesn't work, is it something that might get added, and if so on what time-frame? Or if it does work, could you add Apple Silicon build/setup instructions?

Answered by cmp-nct

Jul 14, 2023

Falcon is quite VRAM friendly compared to llama, it uses MQA which requires a fraction of memory and it appears to be less insulted on heavy quantization than typical llama.
You can run a quite high quality Falcon 40B on less than 24GB RAM. Of course 64GB is amazing, 36GB are required to run it in very high quality which is beyond most single GPUs.

Regarding running on MAC: I've been told it works. I don't have one here for testing but others report it runs fast on 'Metal'.
You should be able to build it similar as llama.cpp, you might need to disable cublas manually in addition to activating metal.

View full answer

cmp-nct · 2023-07-14T23:35:22Z

cmp-nct
Jul 14, 2023
Maintainer

Falcon is quite VRAM friendly compared to llama, it uses MQA which requires a fraction of memory and it appears to be less insulted on heavy quantization than typical llama.
You can run a quite high quality Falcon 40B on less than 24GB RAM. Of course 64GB is amazing, 36GB are required to run it in very high quality which is beyond most single GPUs.

Regarding running on MAC: I've been told it works. I don't have one here for testing but others report it runs fast on 'Metal'.
You should be able to build it similar as llama.cpp, you might need to disable cublas manually in addition to activating metal.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can ggllm.cpp run Falcon on Apple Silicon? #63

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can ggllm.cpp run Falcon on Apple Silicon? #63

Uh oh!

Uh oh!

RDearnaley Jul 14, 2023

Replies: 1 comment

Uh oh!

cmp-nct Jul 14, 2023 Maintainer

RDearnaley
Jul 14, 2023

cmp-nct
Jul 14, 2023
Maintainer