Fine-Tuning a Document Classification Model #2721

Nosajsom · 2022-06-23T19:35:44Z

Nosajsom
Jun 23, 2022

Hello, I am trying to implement document type classification (e.g. is a text document a novel, journal, blog, encyclopedia entry, textbook etc.) in my pipeline. It seems to me that the most suitable node for this is the Document Classifier, but I am getting poor results with the models I've been finding on the Huggingface Hub.

I would like to fine-tune a model using training data, but I am unsure how to do so in this case. The tutorial for fine-tuning a model explains how to use the annotation tool to label a QA dataset, but this tool does not seem to meet my needs. From the looks of it, the Haystack annotation tool requires me to assign questions to portions of text, but I would like to assign a single label (e.g. "textbook") to an entire document.

Does Haystack support what I am trying to do? If not, is anyone aware of any tools I could use to fine-tune my model in a way that Haystack can use in the Document Classifier?

Thank you in advance!

Answered by julian-risch

Jul 4, 2022

Hi @Nosajsom you're right that the Document Classifier node is the node you need. Training of this node is not directly supported in Haystack yet at the moment but you can do the training easily with Hugging Face transformers. Here is an example/tutorial: https://huggingface.co/docs/transformers/training

View full answer

julian-risch · 2022-07-04T11:47:52Z

julian-risch
Jul 4, 2022
Maintainer

Hi @Nosajsom you're right that the Document Classifier node is the node you need. Training of this node is not directly supported in Haystack yet at the moment but you can do the training easily with Hugging Face transformers. Here is an example/tutorial: https://huggingface.co/docs/transformers/training

1 reply

HGamalElDin Jul 4, 2023

Hello @julian-risch I want to do the same task while having a minimal dataset is there any way at the moment to fine-tune the document classifier using a few-shot method?

Kelum-senaka · 2022-11-18T04:42:14Z

Kelum-senaka
Nov 18, 2022

Hi @Nosajsom , Did you find any method to do your task?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-Tuning a Document Classification Model #2721

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Fine-Tuning a Document Classification Model #2721

Uh oh!

Uh oh!

Nosajsom Jun 23, 2022

Replies: 2 comments · 1 reply

Uh oh!

julian-risch Jul 4, 2022 Maintainer

Uh oh!

HGamalElDin Jul 4, 2023

Uh oh!

Kelum-senaka Nov 18, 2022

Nosajsom
Jun 23, 2022

Replies: 2 comments 1 reply

julian-risch
Jul 4, 2022
Maintainer

Kelum-senaka
Nov 18, 2022