Fine-Tuning a Document Classification Model #2721
-
Hello, I am trying to implement document type classification (e.g. is a text document a novel, journal, blog, encyclopedia entry, textbook etc.) in my pipeline. It seems to me that the most suitable node for this is the Document Classifier, but I am getting poor results with the models I've been finding on the Huggingface Hub. I would like to fine-tune a model using training data, but I am unsure how to do so in this case. The tutorial for fine-tuning a model explains how to use the annotation tool to label a QA dataset, but this tool does not seem to meet my needs. From the looks of it, the Haystack annotation tool requires me to assign questions to portions of text, but I would like to assign a single label (e.g. "textbook") to an entire document. Does Haystack support what I am trying to do? If not, is anyone aware of any tools I could use to fine-tune my model in a way that Haystack can use in the Document Classifier? Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @Nosajsom you're right that the Document Classifier node is the node you need. Training of this node is not directly supported in Haystack yet at the moment but you can do the training easily with Hugging Face transformers. Here is an example/tutorial: https://huggingface.co/docs/transformers/training |
Beta Was this translation helpful? Give feedback.
-
Hi @Nosajsom , Did you find any method to do your task? |
Beta Was this translation helpful? Give feedback.
Hi @Nosajsom you're right that the Document Classifier node is the node you need. Training of this node is not directly supported in Haystack yet at the moment but you can do the training easily with Hugging Face transformers. Here is an example/tutorial: https://huggingface.co/docs/transformers/training