Comparing LLMs with LISAs

LISA stands for Local Indicators of Spatial Association.

What's spatial association?

Spatial association in terms of language models refers semantic association using high dimensional embedding vectors.

Why local?

Examples of local concerns in LLM performance evaluation are as follows:

Which LLM is least socially biased?
Which LLM is best with medical answers?
Which LLM is least likely to produce toxic content?
Which LLM produces more nuanced, and less over confident, simplistic answers?

A LISA (a local indicator) might reveal big variation in high priority local concerns that aggregate performance metrics hide.

The objective here is to detect local hot spots in LLM performance variation.

Practically, this might be useful for two reasons:

revealing local patterns of performance gives model users information to make more nuanced performance comparisons between models that are more directly relevant to their specific user priority concerns.
revealing local patterns of performance gives model developers information to make more strategic decisions in feeding high impact examples to fine-tune a LLM

Approach

LISA (Local Indicators of Spatial Association)

The approach I will take is to port work already done in exploratory spatial data analysis to the LLM domain -- more specifically work on local indicators of spatial association (Local Indicators of Spatial Association—LISA).

basic idea with using LLM Arena's ELO data

get % battle wins for each model for each question
see where % wins on 'clustered questions' are correlated
use LLM to name these clusters
write a report on the patterns...disaggregate, more nuanced performance comparisons

Data source: LLM Arena - ELO rating -- head to head battles

Question -- maybe this data might be better ?

https://github.com/VILA-Lab/Open-LLM-Leaderboard?utm_source=chatgpt.com

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src/llm_lisa		src/llm_lisa
tests		tests
.gitignore		.gitignore
README.md		README.md
notes-to-self.md		notes-to-self.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comparing LLMs with LISAs

What's spatial association?

Why local?

Approach

LISA (Local Indicators of Spatial Association)

basic idea with using LLM Arena's ELO data

Data source: LLM Arena - ELO rating -- head to head battles

Question -- maybe this data might be better ?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

borisdev/llm-lisa

Folders and files

Latest commit

History

Repository files navigation

Comparing LLMs with LISAs

What's spatial association?

Why local?

Approach

LISA (Local Indicators of Spatial Association)

basic idea with using LLM Arena's ELO data

Data source: LLM Arena - ELO rating -- head to head battles

Question -- maybe this data might be better ?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages