llama.cpp compared to Transformers, ExLlama, AutoGPTQ #2240

SlyEcho · 2023-07-16T08:24:01Z

SlyEcho
Jul 16, 2023
Collaborator

@oobabooga is comparing different inference engines for their perplexity and of course llama.cpp with K-quants seems to be in the lead.

https://oobabooga.github.io/blog/posts/perplexities/

lucasjinreal · 2023-08-27T13:39:25Z

lucasjinreal
Aug 27, 2023

how about SqueezeLLM? Also, does llama.cpp still fastest compare with exllama on GPU? (quantized)

0 replies

KerfuffleV2 · 2023-08-27T19:51:39Z

KerfuffleV2
Aug 27, 2023
Collaborator

This seems super weird, I'm not sure what he's trying to do just comparing perplexity and not accounting for file size, performance, etc. It seems like it's mostly between 4-bit-ish quantizations but it doesn't actually say that.

Also, he didn't run perplexity against the same corpus as other perplexity measurements:

It was run against a ".txt input file containing some technical blog posts and papers that I collected. It is a lot smaller and faster to evaluate than wikitext, but I find that it correlates perfectly with bigger evaluations." The good old source: trust me bro.

"The perplexity of llama-65b in llama.cpp is indeed lower than for llama-30b in all other backends." - You can take out the "other" there, right? The perplexity for llama-65b in llama.cpp will indeed be lower than the perplexity of llama-30b in llama.cpp. If there wasn't an advantage to a model more than twice as large, why would we bother to use it?

6 replies

KerfuffleV2 Aug 27, 2023
Collaborator

I did read what you wrote, I just don't understand what to do with it. These perplexity results can't be compared with any other perplexity results, we can only look at them relative to your other results there. We also have to trust that it actually does correlate exactly with the normal wikitext perplexity approach, without any evidence.

Not saying I think you are or would deliberately post something wrong, but I think it would be easy to make a mistake which could be misleading. I also don't like trusting stuff without evidence, if you'd posted a couple examples showing the perfect correlation it would be different.

I was trying to do precisely that, compare perplexity for different quantizations and model sizes.

model	format	"perplexity"
facebook_galactica-6.7b	float16	6.78906
llama-30b	q4_K_M	5.21557

What can I do with that information? It's different models and different quantizations, I don't see a way to relate them.

Or:

model	format	"perplexity"
llama-65b	q4_K_M	4.90639
llama-65b	q3_K_M	5.01299

Q3_K_M seems worse than Q4_K_M. Why would I ever want to use it? If I know a little, I could guess the file is probably smaller. A little smaller? Much smaller? The information to actually compare those quantization types and decide whether the perplexity increase was worth it isn't that.

That's basically all I was saying: it doesn't seem like there's enough information to actually do something with that list of models and perplexity figures. I'm a direct person and the comment is pretty blunt, there isn't any additional negativity or criticism between the lines other than what I said.

This is all just my personal opinion as a random individual on the internet posting in a discussion forum. It shouldn't reflect on llama.cpp or any other entity.

oobabooga Aug 27, 2023

It wasn't an all-encompassing test for sure, let alone a scientific one that would pass peer review. My motivation was: I could generate tokens with llama-65b for the first time on my hardware (RTX 3090), but I didn't know if it was worth it relative to running llama-30b on ExLlama like I was used to. Maybe llama.cpp had some fundamental flaw that made it inaccurate and illusory.

To test it in a way that would please me, I wrote the code to evaluate llama.cpp and ExLlama using the transformers library like I had been doing for many months for GPTQ-for-LLaMa, transformers, and AutoGPTQ:

https://github.com/oobabooga/text-generation-webui/blob/main/modules/llamacpp_hf.py#L160
https://github.com/oobabooga/text-generation-webui/blob/main/modules/exllama_hf.py#L98

This test is now available for anyone to try in the UI.

we can only look at them relative to your other results there.

That is correct. I also included some different models (galactica, falcon), and for these the comparison is indeed shadier as they were not trained on the same data as Llama, but I use the resulting perplexity as a personal metric nevertheless. Most model comparisons are flawed in some way, including the Open LLM Leaderboard.

lucasjinreal Aug 28, 2023

@oobabooga thanks for your testing, but, your table are misleading.

from my side, it would be better compare with same params / perplexity + weights size + speed if you got a chance test on same device.

If more strict and scientific, better add your hyper params in generate config params.

KerfuffleV2 Aug 28, 2023
Collaborator

as Kefffule said, your table are misleading.

Just to be clear, I didn't exactly say that. I said they might be misleading (and with the information available it's hard to verify that this definitely isn't the case).

lucasjinreal Aug 28, 2023

ok

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp compared to Transformers, ExLlama, AutoGPTQ #2240

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llama.cpp compared to Transformers, ExLlama, AutoGPTQ #2240

Uh oh!

SlyEcho Jul 16, 2023 Collaborator

Replies: 2 comments · 6 replies

Uh oh!

lucasjinreal Aug 27, 2023

Uh oh!

KerfuffleV2 Aug 27, 2023 Collaborator

Uh oh!

Uh oh!

KerfuffleV2 Aug 27, 2023 Collaborator

Uh oh!

oobabooga Aug 27, 2023

Uh oh!

Uh oh!

lucasjinreal Aug 28, 2023

Uh oh!

KerfuffleV2 Aug 28, 2023 Collaborator

Uh oh!

lucasjinreal Aug 28, 2023

SlyEcho
Jul 16, 2023
Collaborator

Replies: 2 comments 6 replies

lucasjinreal
Aug 27, 2023

KerfuffleV2
Aug 27, 2023
Collaborator

KerfuffleV2 Aug 27, 2023
Collaborator

KerfuffleV2 Aug 28, 2023
Collaborator