Skip to content

Unsupervised Classification of Documents #7

@Fennec2000GH

Description

@Fennec2000GH

Description

Using some kind of clustering algorithm to predict a class per document. Classes may be genre, topic, usefulness, etc. Finding the closest cluster per document relies on a distance metric.

Objectives

  1. Implement different clustering algorithms to classify documents into an arbitrary set of classes. Text similarity would be a good starting point as the distance metric utilized.
  2. Use zero-shot learning (ZSL) to classify documents from a group of pre-determined classes. HuggingFace has a pipeline for that. Checkout the comments in here.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions