question regarding clustering evaluation in MTEB #2278

OnAnd0n · 2025-03-06T11:22:36Z

OnAnd0n
Mar 6, 2025

Hello,
I am interested in evaluation on Korean text embeddings.

I have a question about the clustering task evaluation method in MTEB.

MTEB evaluates clustering task using the K-means algorithm.
However, since K-means measures distance based on Euclidean space,
I think this approach may not be suitable for text embedding models trained to measure similarity using cosine similarity or inner product.

so, I considered an alternative approach that applies spectral clustering on a graph constructed using cosine similarity.

In fact, I observed an improvement in the V-measure score with this method.

I would like to hear your thoughts on this approach.

Best regards,

import networkx as nx

class ClusteringEvaluator_REV(ClusteringEvaluator):
    def __call__(self, model: Encoder, *, encode_kwargs: dict[str, Any] = {}):
        if "batch_size" not in encode_kwargs:
            encode_kwargs["batch_size"] = 32

        corpus_embeddings = model.encode(
            self.sentences,
            task_name=self.task_name,
            **encode_kwargs,
        )
        G = nx.Graph()

        # logger.info("Fitting Faiss K-Means model...")
        for i, i_text in enumerate(self.sentences[:-1]):
          score_list = model.similarity(corpus_embeddings[i], corpus_embeddings[i+1:])[0]*100
          for j_text, score in zip(self.sentences[i+1:], score_list):
                G.add_edge(i_text, j_text, weight=score)


        adjacency_cos_score_matrix = nx.to_numpy_array(G)
        adjacency_cos_score_matrix = np.where(adjacency_cos_score_matrix < 0, 0, adjacency_cos_score_matrix)

        clustering = sklearn.cluster.SpectralClustering(n_clusters=len(set(self.labels)), affinity = 'precomputed', assign_labels='discretize')
        clustering.fit(adjacency_cos_score_matrix)
        cluster_assignment = clustering.labels_

        # logger.info("Evaluating...")
        v_measure = metrics.cluster.v_measure_score(self.labels, cluster_assignment)

        return {"v_measure": v_measure}

	k-means	cos & spectral
intfloat/multilingual-e5-large-instruct	26.62	34.42
BAAI/bge-m3	33.01	47.82
nlpai-lab/KoE5	36.22	47.95
dragonkue/BGE-m3-ko	29.49	40
upskyy/bge-m3-korean	22.08	26.14
nlpai-lab/KURE-v1	39.21	53.46
Snowflake/snowflake-arctic-embed-l-v2.0	40.68	47.53

KennethEnevoldsen · 2025-03-07T16:16:16Z

KennethEnevoldsen
Mar 7, 2025
Maintainer

Oh, this is very interesting!

It seems like ranking is maintained; generally, ranking is maintained with a few exceptions.

	k-means	cos & spectral	rank
upskyy/bge-m3-korean	22.08	26.14	7 (7)
intfloat/multilingual-e5-large-instruct	26.62	34.42	6 (6)
dragonkue/BGE-m3-ko	29.49	40	5 (5)
BAAI/bge-m3	33.01	47.82	4 (4)
nlpai-lab/KoE5	36.22	47.95	3 (2)
nlpai-lab/KURE-v1	39.21	53.46	2 (1)
Snowflake/snowflake-arctic-embed-l-v2.0	40.68	47.53	1 (3)

Though it is definitely not linear, I would love to see a Spearman correlation here.

It is also relevant to examine the outliers (KURE-v1 seems to perform really well).

We could also do a cosine version of Kmeans.

Some of the implementations already normalize the vectors before returning them.

9 replies

OnAnd0n Mar 12, 2025
Author

thank you for your response.
I will try it using example code you provided

OnAnd0n Mar 17, 2025
Author

@KennethEnevoldsen
(This is an another topic...) Even though I have completed my PR, I do not appear on the contributor list.
Is there any additional action I should take?

(add-Data Korean Clustering dataset (KLUE-modified))

KennethEnevoldsen Mar 17, 2025
Maintainer

You do appear as an author here:

When I look at "code ownership" I also think it appears correctly:

(so it might be just be that the contributors take some time to update - not sure actually)

OnAnd0n Mar 17, 2025
Author

Thank you for your response.
I hope to become a contributor to MTEB soon.

KennethEnevoldsen Mar 17, 2025
Maintainer

So you are officially a contributor, you can always refer to the PR for that (+ the "author" sign next to your name).

It seems like it prioritizes the top 100 for the list:

https://docs.github.com/en/repositories/viewing-activity-and-data-for-your-repository/viewing-a-projects-contributors

x-tabdeveloping · 2025-03-25T10:13:23Z

x-tabdeveloping
Mar 25, 2025
Collaborator

Since most embedding models have a normalization layer, my hunch would be that the difference is not due to the cosine-Euclidean disparity, but perhaps because Spectral clustering is just better. Doesn't it considerably slow down evaluation?

1 reply

OnAnd0n Mar 25, 2025
Author

Thank you for your comment.

I am not certain exactly how much longer it takes, but compared to the encoding process, the time difference between k-means and Spectral Clustering does not seem significant.

When evaluating the BAAI/bge-m3 model using an evaluation dataset with 904 rows and short text (titles),
the processing times were as follows:

k-means: 1m 10s
Spectral Clustering: 1m 29s

However, one potential issue is that the adjacency matrix grows quadratically.
but, unless the evaluation dataset exceeds several tens of thousands of entries, memory usage should not be a concern.

Just as various metrics such as NDCG and MRR are used for evaluating retrievers,
I brought up this discussion because I want to explore the use of diverse metrics to conduct a more precise evaluation of clustering.

Samoed · 2025-03-25T10:17:36Z

Samoed
Mar 25, 2025
Collaborator

PR with implementation #2430

0 replies

question regarding clustering evaluation in MTEB #2278

Uh oh!

Uh oh!

OnAnd0n Mar 6, 2025

Replies: 3 comments · 10 replies

Uh oh!

Uh oh!

KennethEnevoldsen Mar 7, 2025 Maintainer

Uh oh!

OnAnd0n Mar 12, 2025 Author

Uh oh!

Uh oh!

OnAnd0n Mar 17, 2025 Author

Uh oh!

KennethEnevoldsen Mar 17, 2025 Maintainer

Uh oh!

OnAnd0n Mar 17, 2025 Author

Uh oh!

KennethEnevoldsen Mar 17, 2025 Maintainer

Uh oh!

x-tabdeveloping Mar 25, 2025 Collaborator

Uh oh!

Uh oh!

OnAnd0n Mar 25, 2025 Author

Uh oh!

Samoed Mar 25, 2025 Collaborator

OnAnd0n
Mar 6, 2025

Replies: 3 comments 10 replies

KennethEnevoldsen
Mar 7, 2025
Maintainer

OnAnd0n Mar 12, 2025
Author

OnAnd0n Mar 17, 2025
Author

KennethEnevoldsen Mar 17, 2025
Maintainer

OnAnd0n Mar 17, 2025
Author

KennethEnevoldsen Mar 17, 2025
Maintainer

x-tabdeveloping
Mar 25, 2025
Collaborator

OnAnd0n Mar 25, 2025
Author

Samoed
Mar 25, 2025
Collaborator