Skip to content

borisdev/llm-lisa

Repository files navigation

Comparing LLMs with LISAs

LISA stands for Local Indicators of Spatial Association.

What's spatial association?

Spatial association in terms of language models refers semantic association using high dimensional embedding vectors.

Why local?

Examples of local concerns in LLM performance evaluation are as follows:

  • Which LLM is least socially biased?
  • Which LLM is best with medical answers?
  • Which LLM is least likely to produce toxic content?
  • Which LLM produces more nuanced, and less over confident, simplistic answers?

A LISA (a local indicator) might reveal big variation in high priority local concerns that aggregate performance metrics hide.

The objective here is to detect local hot spots in LLM performance variation.

Practically, this might be useful for two reasons:

  • revealing local patterns of performance gives model users information to make more nuanced performance comparisons between models that are more directly relevant to their specific user priority concerns.
  • revealing local patterns of performance gives model developers information to make more strategic decisions in feeding high impact examples to fine-tune a LLM

Approach

LISA (Local Indicators of Spatial Association)

The approach I will take is to port work already done in exploratory spatial data analysis to the LLM domain -- more specifically work on local indicators of spatial association (Local Indicators of Spatial Association—LISA).

basic idea with using LLM Arena's ELO data

  • get % battle wins for each model for each question
  • see where % wins on 'clustered questions' are correlated
  • use LLM to name these clusters
  • write a report on the patterns...disaggregate, more nuanced performance comparisons

Data source: LLM Arena - ELO rating -- head to head battles

Question -- maybe this data might be better ?

https://github.com/VILA-Lab/Open-LLM-Leaderboard?utm_source=chatgpt.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages